Android 系统调试技巧(4)系统异常调试
1. 前言
这里总结几种系统异常时,常用的几种调试方法
2. Debuggerd
Debuggerd 和 echo t > /proc/sysrq-trigger 一起调试进程空间和内核空间死锁、睡眠问题
3.Kill命令
Kill -6
可以打印所有进程的core dump backtrace,数据会保存到/data/tombstones/tombstone_0{0..9}递增文件中,同时也会打印一份保存到
data/anr/traces.txt文件中。其效果和debuggerd 打印的core dump结果一致;
Kill -3
可以打印zygote进程空间的core dump backtrac,数据只保存到/data/anr/traces.txt文件中,类似于AMS 中watchdog服务检查到ANR后打出的traces.txt结果一致;
4.Strace
usage: strace [-CdffhiqrtttTvVxxy] [-I n] [-e expr]...
[-a column] [-o file] [-s strsize] [-P path]...
-p pid... / [-D] [-E var=val]... [-u username] PROG [ARGS]
or: strace -c[df] [-I n] [-e expr]... [-O overhead] [-S sortby]
-p pid... / [-D] [-E var=val]... [-u username] PROG [ARGS]
-c -- count time, calls, and errors for each syscall and report summary
-C -- like -c but also print regular output
-w -- summarise syscall latency (default is system time)
-d -- enable debug output to stderr
-D -- run tracer process as a detached grandchild, not as parent
-f -- follow forks, -ff -- with output into separate files
-i -- print instruction pointer at time of syscall
-q -- suppress messages about attaching, detaching, etc.
-r -- print relative timestamp, -t -- absolute timestamp, -tt -- with usecs
-T -- print time spent in each syscall
-v -- verbose mode: print unabbreviated argv, stat, termios, etc. args
-x -- print non-ascii strings in hex, -xx -- print all strings in hex
-y -- print paths associated with file descriptor arguments
-h -- print help message, -V -- print version
-a column -- alignment COLUMN for printing syscall results (default 40)
-b execve -- detach on this syscall
-e expr -- a qualifying expression: option=[!]all or option=[!]val1[,val2]...
options: trace, abbrev, verbose, raw, signal, read, write
-I interruptible --
1: no signals are blocked
2: fatal signals are blocked while decoding syscall (default)
3: fatal signals are always blocked (default if '-o FILE PROG')
4: fatal signals and SIGTSTP (^Z) are always blocked
(useful to make 'strace -o FILE PROG' not stop on ^Z)
-o file -- send trace output to FILE instead of stderr
-O overhead -- set overhead for tracing syscalls to OVERHEAD usecs
-p pid -- trace process with process id PID, may be repeated
-s strsize -- limit length of print strings to STRSIZE chars (default 32)
-S sortby -- sort syscall counts by: time, calls, name, nothing (default time)
-u username -- run command as username handling setuid and/or setgid
-E var=val -- put var=val in the environment for command
-E var -- remove var from the environment for command
-P path -- trace accesses to path
strace -Ff -p 1364 -T
strace -Ff -p 1364 -T -r
strace -Ff -p 1364 -T -t
或者 strace -Ff -p 1364 -T -tt
strace -Ff -p 1364 -T -tt -o /data/strace.log
strace -Ff -p 1364 -c 系统调用耗时
strace -Ff -p 1364 -c -w等待系统调用耗时
strace -Ff -p 1364 -y -tt -T
5. 应用进程ANR
(1)首先通过strace -fF -p {$PID} 确认到具体的线程ANR状态
(2)通过debuggerd -b {$PID} 确认线程backtrace栈状态
(3)异步等待
ANR线程A:A1在等待同一个进程空间的线程A:B1处理任务,再通过strace追踪线程状态;
(4)同步睡眠
ANR线程A:A1在等待一个锁,检查锁被哪个线程占用;
(5)系统调用阻塞
ANR线程A:A1在系统调用中发生睡眠,打印出进程在内核空间的栈分析系统调用睡眠原因
(6)进程间通信等待
ANR线程A:A1在进程间通信binder过程中睡眠,通过当前进程proc的binder线程状态确认线程等待关系,例如线程A:A1等待线程B:B1,通过strace或者debuggerd确认线程B:B1状态,对B:B1
的分析同样要去判断是否发生异步等待、同步睡眠、统调用阻塞和进程间通信等待
6. Monkey稳定性问题
monkey问题排查思路,monkey测试停止,无非有两种情况:
- 系统异常重启;
- 内核内存回收oom kill掉monkey(内存泄漏)
1.android场景下,一般都是a情况,针对a情况,有很多类型:
1). 系统native重要进程abort掉,父进程init进程kill掉所有子进程,重启系统;
2). system server watchdog 检测到ANR,kill掉system server,zygote检测到system server子进程退出,自己kill掉自己,init检测到子进程ygote退出后,kill掉所有的子进程重启;
3). system server 进程空间线程发生异常abort掉,走了2)的流程
2. 排查此类问题,首先要从后台log中,检查a情况是否发生,通过搜索关键字
AndroidRuntime START com.android.internal.os.ZygoteInit
如果关键字发生两次以上,说明系统发生了重启,确认了a类问题后,仍需进一步确认1)、2)、3)三类情况中的哪一种,方法如下:
针对1)类问题,搜索一下关键字,然后反向搜索,确认是否是系统native进程例如surfaceflinger发生异常;
ServiceManager( 1584): service 'display' died
ServiceManager( 1584): service 'usagestats' died
ServiceManager( 1584): service 'batterystats' died
针对2)类问题,执行搜索关键字:
WATCHDOG KILLING SYSTEM PROCESS
针对3)类问题,执行搜索关键字:
system_server
3. 现场问题分析注意事项:
首先要在log文件中,确认Zygote 和 SystemServer进程pid,然后才能去检索第一现场附件的log,一旦系统出现多次重启,很容易迷失在log中。
系统Zygote初始化关键字:
01-01 08:01:53.770 D/AndroidRuntime( 1590): >>>>>> AndroidRuntime START com.android.internal.os.ZygoteInit <<<<<<
Zygote初始化system server关键字:
01-01 08:02:01.210 I/dalvikvm( 1590): System server process 2369 has been created
01-01 08:02:01.220 I/SystemServer( 2369): start SystemServer main :16606
- 分享
- 举报
-
浏览量:5729次2021-03-30 14:44:45
-
浏览量:6308次2021-03-29 11:34:27
-
浏览量:5063次2021-03-29 14:17:09
-
浏览量:6061次2021-03-29 15:00:21
-
浏览量:5296次2020-10-15 15:43:43
-
浏览量:2646次2022-09-30 16:48:25
-
浏览量:5085次2022-07-24 15:47:30
-
浏览量:5125次2020-12-19 16:14:06
-
浏览量:6681次2020-12-19 15:34:44
-
浏览量:5734次2021-03-31 15:36:17
-
浏览量:2887次2020-11-10 14:23:32
-
浏览量:4860次2022-10-14 08:34:42
-
浏览量:6912次2020-11-26 17:02:47
-
浏览量:14488次2021-01-16 15:43:02
-
浏览量:4815次2021-03-26 16:03:04
-
浏览量:1944次2020-12-30 16:54:40
-
浏览量:1041次2023-12-16 15:54:44
-
浏览量:8340次2021-01-20 17:16:00
-
浏览量:4750次2017-11-30 17:16:20
-
广告/SPAM
-
恶意灌水
-
违规内容
-
文不对题
-
重复发帖
free-jdx
感谢您的打赏,如若您也想被打赏,可前往 发表专栏 哦~
举报类型
- 内容涉黄/赌/毒
- 内容侵权/抄袭
- 政治相关
- 涉嫌广告
- 侮辱谩骂
- 其他
详细说明