nand monkey老化测试内存泄露分析

free-jdx 2021-04-02 15:26:31 4717
1. 现场记录
6>[233147.509309] SysRq : Manual OOM execution
<4>[233147.514648] kworker/2:2 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
<6>[233147.514669] kworker/2:2 cpuset=/ mems_allowed=0
<4>[233147.514680] CPU: 2 PID: 6447 Comm: kworker/2:2 Tainted: G           O 3.10.65 #1
<4>[233147.514702] Workqueue: events moom_callback
<0>[233147.514710] Call trace:
<4>[233147.517515] [<ffffffc000088704>] dump_backtrace+0x0/0x11c
<4>[233147.517531] [<ffffffc000088840>] show_stack+0x20/0x30
<4>[233147.517545] [<ffffffc000776364>] dump_stack+0x1c/0x28
<4>[233147.517555] [<ffffffc00077504c>] dump_header.isra.13+0x90/0x1a0
<4>[233147.517567] [<ffffffc0001548e0>] oom_kill_process+0x84/0x36c
<4>[233147.517576] [<ffffffc000155050>] out_of_memory+0x268/0x290
<4>[233147.517584] [<ffffffc0003c13e0>] moom_callback+0x28/0x34
<4>[233147.517598] [<ffffffc0000b70e4>] process_one_work+0x270/0x3f0
<4>[233147.517607] [<ffffffc0000b8238>] worker_thread+0x210/0x330
<4>[233147.517619] [<ffffffc0000be100>] kthread+0xb4/0xc0
<4>[233147.517624] Mem-Info:
<4>[233147.517633] DMA per-cpu:
<4>[233147.517640] CPU    0: hi:  186, btch:  31 usd:  15
<4>[233147.517647] CPU    1: hi:  186, btch:  31 usd: 126
<4>[233147.517653] CPU    2: hi:  186, btch:  31 usd:  33
<4>[233147.517660] CPU    3: hi:  186, btch:  31 usd: 101
<4>[233147.517675] active_anon:6 inactive_anon:491 isolated_anon:0
<4>[233147.517675]  active_file:126 inactive_file:152 isolated_file:0
<4>[233147.517675]  unevictable:1727 dirty:0 writeback:1 unstable:0
<4>[233147.517675]  free:65151 slab_reclaimable:2187 slab_unreclaimable:17740
<4>[233147.517675]  mapped:364 shmem:0 pagetables:1563 bounce:0
<4>[233147.517675]  free_cma:59401
<4>[233147.517711] DMA free:260604kB min:6644kB low:30268kB high:31928kB active_anon:24kB inactive_anon:1964kB active_file:504kB inactive_file:608kB unevictable:6908kB isolated(anon):0kB isolated(file):0kB present:1032192kB managed:690172kB mlocked:0kB dirty:0kB writeback:4kB mapped:1456kB shmem:0kB slab_reclaimable:8748kB slab_unreclaimable:70960kB kernel_stack:7104kB pagetables:6252kB unstable:0kB bounce:0kB free_cma:237604kB writeback_tmp:0kB pages_scanned:43 all_unreclaimable? no
<4>[233147.517718] lowmem_reserve[]: 0 0 0
<4>[233147.517728] DMA: 503*4kB (UEM) 259*8kB (UEMC) 145*16kB (UEMC) 98*32kB (UEM) 129*64kB (UEMC) 161*128kB (UEMC) 55*256kB (UMC) 26*512kB (UMC) 9*1024kB (UMC) 14*2048kB (MC) 159*4096kB (MRC) = 754948kB
<4>[233147.517778] 2525 total pagecache pages
<4>[233147.517787] 496 pages in swap cache
<4>[233147.517793] Swap cache stats: add 38367408, delete 38366912, find 10676545/15065336
<4>[233147.517798] Free swap  = 1048kB
<4>[233147.517803] Total swap = 163836kB
<4>[233147.541740] 258048 pages RAM
<4>[233147.541754] 7902 pages reserved
<4>[233147.541759] 919895 pages shared
<4>[233147.541764] 46073 pages non-shared
<6>[233147.541770] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
<6>[233147.541822] [ 1094]     0  1094     2265       62       5       37         -1000 ueventd
<6>[233147.541845] [ 1584]     0  1584     2265       59       5       36         -1000 fswatcherd
<6>[233147.541862] [ 1619]  1023  1619     4065        0       7      207         -1000 sdcard
<6>[233147.541874] [ 1620]  1036  1620     6560        0      13     1031         -1000 logd
<6>[233147.541885] [ 1621]     0  1621     2515       71       5       49         -1000 healthd
<6>[233147.541896] [ 1622]     0  1622     3344        0       7      109         -1000 lmkd
<6>[233147.541909] [ 1623]  1000  1623     2604        0       6       81         -1000 servicemanager
<6>[233147.541920] [ 1624]     0  1624     5456        0      10      229         -1000 vold
<6>[233147.541932] [ 1625]  1000  1625    19260        0      41      746         -1000 surfaceflinger
<6>[233147.541943] [ 1626]     0  1626     2528        1       6       73         -1000 sh
<6>[233147.541955] [ 1637]     0  1637     2528       10       6       70         -1000 sh
<6>[233147.541968] [ 1691]     0  1691     2528       48       6       52         -1000 sh
<6>[233147.541979] [ 1693]     0  1693     2538        0       5      190         -1000 debuggerd
<6>[233147.541991] [ 1694]     0  1694     3016        0       7      245         -1000 debuggerd64
<6>[233147.542002] [ 1695]     0  1695     3152        0       7      114         -1000 rild
<6>[233147.542013] [ 1696]  1019  1696     6682        0      13      360         -1000 drmserver
<6>[233147.542025] [ 1698]  1012  1698     2622        1       6       80         -1000 installd
<6>[233147.542036] [ 1700]  1017  1700     3895        0       9      195         -1000 keystore
<6>[233147.542048] [ 1701]     0  1701     2202        0       5       45         -1000 multi_ir
<6>[233147.542059] [ 1702]     0  1702    16354        0      32      732         -1000 systemmixservic
<6>[233147.542071] [ 1703]     0  1703    16352        2      30      733         -1000 isomountmanager
<6>[233147.542083] [ 1704]     0  1704    16606        0      30      756         -1000 gpioservice
<6>[233147.542094] [ 1705]     0  1705    16355        0      30      733         -1000 securefileserve
<6>[233147.542106] [ 1707]     0  1707   367607        0      95     2005         -1000 main
<6>[233147.542117] [ 1708]     0  1708     4047       70       7      411         -1000 adbd
<6>[233147.542129] [ 1847]     0  1847     2489        0       6       80         -1000 logcat
<6>[233147.542142] [ 8012]     0  8012     2489        0       5       96         -1000 logcat
<6>[233147.542156] [ 6752]     0  6752   513629        0     113     2664         -1000 main
<6>[233147.542168] [ 6754]     0  6754     6209        0      12      219         -1000 netd
<6>[233147.542179] [ 6755]  1013  6755    39828        0      46     1044         -1000 mediaserver
<6>[233147.542191] [ 7214]  1000  7214   542387        0     179    10210          -941 system_server
<6>[233147.542203] [ 8091] 10010  8091   520573        0     126     7493          -705 ndroid.systemui
<6>[233147.542217] [ 8519]  1000  8519   370035        0      74     2343          -705 iracastReceiver
<6>[233147.542229] [ 8629]  1010  8629     4031        0       9      225         -1000 wpa_supplicant
<6>[233147.542240] [ 9404]  1014  9404     2578        0       6       86         -1000 dhcpcd
<6>[233147.542252] [16422] 10021 16422   520429        0     113     3643           117 putmethod.latin
<6>[233147.542267] [31474] 10018 31474   535839        0     167    11959             0 er.firelauncher
<6>[233147.542282] [ 1106] 10028  1106   376544        0     100     2980           294 ay.happyplay.aw
<6>[233147.542294] [ 1131] 10003  1131   516663        0     104     3068           294 d.process.media
<6>[233147.542305] [ 1155] 10026  1155   517634        0     101     2960           470 ftwinner.update
<3>[233147.542315] Out of memory: Kill process 1155 (ftwinner.update) score 480 or sacrifice child

/ # cpu_monitor -u 1 -m 500


 ---------------------------------H64--Mem State {unit:MB}------------------------------------
 -- Total - Memory:    977   - Swap:    159    - Vma: 245759   
    Anon     Slab    Cache   Buffer  KernStack  Total-Free   Sys-Free   Cma-Free  Swap-Free   Vma-Free 
       3       77       11        0        1         246         14        232          7     245691 
       6       77        9        0        1         245         14        231         10     245691 
       0       77        8        0        1         251         18        232          5     245691 
       0       77        7        0        1         252         19        233          4     245691 
       0       77        7        0        1         252         19        233          4     245691 
       0       77        7        0        1         252         18        233          4     245691 
       0       77        7        0        1         253         18        234          4     245691 
       0       77        7        0        1         253         18        234          4     245691 
       0       77        7        0        1         252         18        234          4     245691 
       0       77        7        0        1         252         19        233          4     245691 
       0       77        7        0        1         252         18        234          4     245691 
       0       77        7        0        1         252         18        233          4     245691 
       0       77        7        0        1         252         18        233          4     245691 
       0       77        7        0        1         252         18        234          4     245691 
       0       77        8        0        1         252         18        234          4     245691 
       0       77        7        0        1         251         19        232          4     245691 
       0       77        7        0        1         252         18        233          4     245691 
       0       77        8        0        1         252         18        234          4     245691 
2. Android 统计lost ram方法:

Lost Ram = Total Ram - Used Ram -Free Ram
其中Used Ram = Android 用户进程PSS(包含Anon) + Kernel slab + KernelStack + PageTables

Free Ram = cache proces + Kernel File cache + Kernel free

由于上面的Used Ram对kernel统计不够全面,漏掉了kernel drive直接从buddy中申请的内存,例如(Gpu page alloc/dma alloc、VE/DE Cma alloc、+ 音频Dma alloc、binder vmalloc 和zram alloc等内存)
因此Lost = kernel reserve + kernel driver page alloc

3. 问题描述

(1)Android memoryleak检测
stop
start
重启android 后,Lost Ram 依然保持不变,因此初步排除Lost Ram 跟Androd 内存泄露关联不大.
(2)内核 memoryleak检测
内核 kmemleak检测,没发现出现明显的大内存泄露,包括slab内存申请和文件系统的page alloc申请.

(3)内核某些模块直接通过page alloc使用buddy内存,怀疑出现异常,导致内存无法释放,造成内存不断消耗

使用page owner 核对内核内存使用,数据量均正常,没有发现明显差异.初步排除此坏一点.
(注:配置给zram的swap空间,会在压缩过程中直接从buddy申请内存,这部分内存被被算到Lost Ram中的)

(4)发现内存统计存在异常

cat /proc/pageinfo

 Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
 Node 0, zone DMA, type Unmovable 51 136 19 327 339 152 27 2 1 1 0 
 Node 0, zone DMA, type Reclaimable 227 60 26 14 2 1 1 1 0 1 0 
 Node 0, zone DMA, type Movable 255 616 92 30 19 2 4 2 3 4 100 
 Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 0 2 
 Node 0, zone DMA, type CMA 1163 1577 1027 186 15 17 19 17 15 12 16
 Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 
 Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate Node 0, zone DMA 33 4 138 2 75 0 

计算pageinfo 节点中的free:

order-: 
 0-4k: 1696 
 1-8k: 2389 
 2-16k: 1164 
 3-32k: 557 
 4-64k: 375 
 5-128k: 172 
 6-256k: 46 
 7-512K: 20 
 8-1024K: 18 
 9-2048k: 18 
 10-4096k: 118 
 free : 6784 + 19112 + 18624 + 17824 + 24000 + 22016 + 11776 + 10240 + 18432 + 36864 + 483328 = 669000 (653MB) 
 dumpsys meminfo 
 Total RAM: 1000408 kB (status critical) 
 Free RAM: 185240 kB (0 cached pss + 14908 cached kernel + 170332 free) 
 Used RAM: 311229 kB (177513 used pss + 133716 kernel) 
 Lost RAM: 500139 kB 
4. 问题分析

结果:
(1) 疑点1
Lost RAM = (pageinfo)free - (meminfo)free ,推断出系统从/proc/meminfo节点得到的mem free数据存在偏差,

导致dumpsys meminfo把偏差全部算在了Lost Ram中./proc/meminfo中的free数据为什么和pageinfo中统计数据不同?
/proc/meminfo 的free 为:si_meminfo-->freeram = global_page_state(NR_FREE_PAGES)
pageinfo 的free为:
pdata->zone->free_area[order]-->free_list[mtype]统计

<6>[234260.975325] SysRq : Show Memory <4>[234260.978880] 
Mem-Info: <4>[234260.978889] DMA per-cpu: <4>[234260.978897] 
CPU 0: hi: 186, btch: 31 usd: 30 <4>[234260.978903] 
CPU 1: hi: 186, btch: 31 usd: 182 <4>[234260.978919] 
active_anon:27 inactive_anon:140 isolated_anon:32 <4>[234260.978919] active_file:39 inactive_file:161 isolated_file:0 
<4>[234260.978919] unevictable:1727 dirty:0 writeback:0 unstable:0 
<4>[234260.978919] free:64562 slab_reclaimable:2160 slab_unreclaimable:17660 
<4>[234260.978919] mapped:305 shmem:0 pagetables:1524 bounce:0 
<4>[234260.978919] free_cma:59927 
<4>[234260.978951] DMA free:258248kB min:6644kB low:30268kB high:31928kB active_anon:108kB inactive_anon:560kB active_file:156kB 
inactive_file:644kB unevictable:6908kB isolated(anon):128kB isolated(file):0kB present:1032192kB managed:690172kB mlocked:0kB 
dirty:0kB writeback:0kB mapped:1220kB shmem:0kB slab_reclaimable:8640kB slab_unreclaimable:70640kB kernel_stack:6672kB 
pagetables:6096kB unstable:0kB bounce:0kB free_cma:239708kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? no 
<4>[234260.978957] lowmem_reserve[]: 0 0 0 
<4>[234260.978968] DMA: 512*4kB (UEMC) 392*8kB (UEMC) 351*16kB (UEMC) 136*32kB (UEMC) 117*64kB (UEMC) 133*128kB (UEMC) 47*256kB (UMC) 23*512kB (UMC) 9*1024kB (UMC) 14*2048kB (MC) 159*4096kB (MRC) = 752624kB 
<4>[234260.979016] 2097 total pagecache pages 
<4>[234260.979025] 143 pages in swap cache 
<4>[234260.979031] Swap cache stats: add 45655669, delete 45655526, find 10847251/16089610 
<4>[234260.979036] Free swap = 3404kB 
<4>[234260.979041] Total swap = 163836kB 
<4>[234260.982533] 258048 pages RAM 
<4>[234260.982533] 7902 pages reserved 
<4>[234260.982533] 788206 pages shared 
<4>[234260.982533] 46679 pages non-shared

DMA free:258248kB
DMA: 5124kB (UEMC) 3928kB (UEMC) 35116kB (UEMC) 13632kB (UEMC) 11764kB (UEMC) 133128kB (UEMC) 47256kB (UMC) 23512kB (UMC) 91024kB (UMC) 142048kB (MC) 159*4096kB (MRC) = 752624kB
这两处存在明显的内存差异

(2)疑点2
Movable order-10 free 内存页100,合计483328KB,约472MB,理论上monkey测试系统碎片化会越来越严重,

为什么Movable 4MB大块连续内存这么多? 从monkey的测试过程来看,moveable的order-10 free 内存页确实在不停的增长,且增长的数量与lost ram存在很接近的数量比例.
在page_alloc过程中,怀疑order为0的页面申请__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order))出错.

声明:本文内容由易百纳平台入驻作者撰写,文章观点仅代表作者本人,不代表易百纳立场。如有内容侵权或者其他问题,请联系本站进行删除。
free-jdx
红包 96 7 评论 打赏
评论
0个
内容存在敏感词
手气红包
    易百纳技术社区暂无数据
相关专栏
置顶时间设置
结束时间
删除原因
  • 广告/SPAM
  • 恶意灌水
  • 违规内容
  • 文不对题
  • 重复发帖
打赏作者
易百纳技术社区
free-jdx
您的支持将鼓励我继续创作!
打赏金额:
¥1易百纳技术社区
¥5易百纳技术社区
¥10易百纳技术社区
¥50易百纳技术社区
¥100易百纳技术社区
支付方式:
微信支付
支付宝支付
易百纳技术社区微信支付
易百纳技术社区
打赏成功!

感谢您的打赏,如若您也想被打赏,可前往 发表专栏 哦~

举报反馈

举报类型

  • 内容涉黄/赌/毒
  • 内容侵权/抄袭
  • 政治相关
  • 涉嫌广告
  • 侮辱谩骂
  • 其他

详细说明

审核成功

发布时间设置
发布时间:
是否关联周任务-专栏模块

审核失败

失败原因
备注
拼手气红包 红包规则
祝福语
恭喜发财,大吉大利!
红包金额
红包最小金额不能低于5元
红包数量
红包数量范围10~50个
余额支付
当前余额:
可前往问答、专栏板块获取收益 去获取
取 消 确 定

小包子的红包

恭喜发财,大吉大利

已领取20/40,共1.6元 红包规则

    易百纳技术社区