6913
- 收藏
- 点赞
- 分享
- 举报
一次内存问题定位
本帖最后由 anwafs 于 2020-4-6 22:23 编辑
问题描述:
设备跑一个小时后,不停录像,系统宕机,oops报错,必现。
SAMPLE_COMM_VENC_GetVencStreamProc]-1700: create venc file suc:stream_chn1_60.h264
[03:31_20:53:06]Unable to handle kernel paging request at virtual address 66f4adff
[03:31_20:53:06]pgd = c0004000
[03:31_20:53:06][66f4adff] *pgd=00000000
[03:31_20:53:06]Internal error: Oops: 5 [#1] SMP ARM
[03:31_20:53:06]Modules linked in: media(O) pbs(O) pcs(O) osa(O) hi_mipi_rx(O) hi3516cv500_acodec(PO) hi3516cv500_adec(PO) hi3516cv500_aenc(PO) hi3516cv500_ao(PO) hi3516cv500_ai(PO) hi3516cv500_aio(PO) hi3516cv500_hdmi(PO) hi_sensor_spi(O) hi_sensor_i2c(O) hi_piris(O) hi_pwm(O) hi3516cv500_nnie(PO) hi3516cv500_ive(PO) hi3516cv500_vdec(PO) hi3516cv500_vfmw(PO) hi3516cv500_jpegd(PO) hi3516cv500_jpege(PO) hi3516cv500_h265e(PO) hi3516cv500_h264e(PO) hi3516cv500_venc(PO) hi3516cv500_rc(PO) hi3516cv500_vedu(PO) hi3516cv500_chnl(PO) hifb(O) hi3516cv500_vo(PO) hi3516cv500_vpss(PO) hi3516cv500_isp(PO) hi3516cv500_vi(PO) hi3516cv500_dis(PO) hi3516cv500_vgs(PO) hi3516cv500_gdc(PO) hi3516cv500_rgn(PO) hi3516cv500_tde(PO) hi3516cv500_sys(PO) hi3516cv500_base(PO) hi_osal(O) sys_config(O)
[03:31_20:53:06]CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O 4.9.37 #13
[03:31_20:53:06]Hardware name: Generic DT based system
[03:31_20:53:06]task: c0a06200 task.stack: c0a00000
[03:31_20:53:06]PC is at h264e_get_stream_buf_stat+0x58/0x140 [hi3516cv500_h264e]
[03:31_20:53:06]LR is at osal_spin_lock_irqsave+0x10/0x18 [hi_osal]
[03:31_20:53:06]pc : [] lr : [] psr: 60000193
[03:31_20:53:06]sp : c0a01cf8 ip : 00000000 fp : bf355e18
[03:31_20:53:06]r10: 00000001 r9 : f2492fdc r8 : 00000000
[03:31_20:53:06]r7 : 00000001 r6 : ffffffff r5 : f2d49d8c r4 : f2d4a000
[03:31_20:53:06]r3 : 66f4adff r2 : fd448fcb r1 : f2f42040 r0 : a0000193
[03:31_20:53:06]Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
[03:31_20:53:06]Control: 10c5387d Table: ae25c06a DAC: 00000051
[03:31_20:53:06]Process swapper/0 (pid: 0, stack limit = 0xc0a00210)
[03:31_20:53:06]Stack: (0xc0a01cf8 to 0xc0a02000)
[03:31_20:53:06]1ce0: f249af88 000001f3
[03:31_20:53:06]1d00: a0000193 c0a02fc8 f2492fdc f2492f88 f249af88 bf34541c 00000002 00000002
[03:31_20:53:06]1d20: 00000007 00000000 a49c8f60 0000000b 02fb0975 00000000 3b9cbb30 00000000
[03:31_20:53:06]1d40: 00000032 00000000 bf249e80 00000000 32000000 60000193 f2a7f094 00000007
[03:31_20:53:06]1d60: f2a7f0cf 00000007 f2a7f109 00000007 00000000 00000000 00000000 c0a02fc8
[03:31_20:53:06]1d80: fffffff7 f2496178 00000001 bf2ee730 bf2ee65c 00000000 bf2ee648 bf2ee688
[03:31_20:53:06]1da0: 00000002 bf2e8f84 f0aea158 bf2ed738 00000000 00000000 bf2ed904 bf02c950
[03:31_20:53:06]1dc0: f0aea158 bf205744 00000000 c0a02fc8 a0000193 bf2ee6b8 bf02c950 00000000
[03:31_20:53:06]1de0: 00000000 bf2ee640 bf2ee730 00000000 bf2ee648 bf2e9424 c0a01e8c ef7d2358
[03:31_20:53:06]1e00: 00000001 c01762b4 2615f9dc 00000000 0db89d7b 00000002 00000000 c0a02fc8
[03:31_20:53:06]1e20: c0a01e4c bf2ee640 bf2ee6b8 bf2ee6b0 00000200 00000000 c0a01e8c ef7d2358
[03:31_20:53:06]1e40: 00000001 bf2e993c 0003832a 2615f9dc a0000113 c0a02fc8 ffffe000 00000100
[03:31_20:53:06]1e60: bf2e9884 c016de80 c0a0b780 ef7d4ac0 ef7d2300 00000000 c0a02d00 c016e09c
[03:31_20:53:06]1e80: c093c300 2ee96000 00000000 00000000 0000009e 00000000 00000000 ef7ec7c0
[03:31_20:53:06]1ea0: 00000000 c017c9b0 34c5df43 c0a02fc8 ef00f8c0 00000020 00000001 c0a00000
[03:31_20:53:06]1ec0: c0a02084 c0a02080 00000100 c0a02080 40000001 c011de6c 00000000 f0803000
[03:31_20:53:06]1ee0: c0a01ed8 c0a36500 0000000a 0000943e c0a02d00 00200100 f0803000 c093d130
[03:31_20:53:06]1f00: 00000000 00000000 00000001 ef008000 f0803000 ef7ec7c0 00000000 c011e27c
[03:31_20:53:06]1f20: c093d130 c015df00 c0a137c0 c0a032a0 f080200c c0a01f60 f0802000 c01014c4
[03:31_20:53:06]1f40: c0108f04 60000013 ffffffff c0a01f94 c0a0cce2 c0a00000 ef7ec7c0 c010cb8c
[03:31_20:53:06]1f60: 00000000 0011560a ef7d22e8 c0115ce0 c0a00000 c0a02fe4 00000001 c0a03038
[03:31_20:53:06]1f80: c0a0cce2 c09308c8 ef7ec7c0 00000000 60000013 c0a01fb0 c0108f00 c0108f04
[03:31_20:53:06]1fa0: 60000013 ffffffff 00000051 00000000 c0a00000 c0154cb8 ffffffff c0900d50
[03:31_20:53:06]1fc0: 00000000 ffffffff 00000000 c09006b4 c09308c8 c0a02fc8 c0a35eb0 c0a02fdc
[03:31_20:53:06]1fe0: c09308c4 c0a072f0 8000406a 410fc075 00000000 8000807c 00000000 00000000
[03:31_20:53:06][] (h264e_get_stream_buf_stat [hi3516cv500_h264e]) from [] (venc_reigster_inq_task+0x2a4/0x764 [hi3516cv500_venc])
[03:31_20:53:06][] (venc_reigster_inq_task [hi3516cv500_venc]) from [] (chnl_scheduler+0x138/0x370 [hi3516cv500_chnl])
[03:31_20:53:06][] (chnl_scheduler [hi3516cv500_chnl]) from [] (chnl_timer_int_handler+0x268/0x6c8 [hi3516cv500_chnl])
[03:31_20:53:06][] (chnl_timer_int_handler [hi3516cv500_chnl]) from [] (chnl_timer_isr+0xb8/0x114 [hi3516cv500_chnl])
[03:31_20:53:06][] (chnl_timer_isr [hi3516cv500_chnl]) from [] (call_timer_fn.constprop.1+0x28/0x98)
[03:31_20:53:06][] (call_timer_fn.constprop.1) from [] (run_timer_softirq+0x1ac/0x208)
[03:31_20:53:06][] (run_timer_softirq) from [] (__do_softirq+0xd0/0x21c)
[03:31_20:53:06][] (__do_softirq) from [] (irq_exit+0xa8/0xdc)
[03:31_20:53:06][] (irq_exit) from [] (__handle_domain_irq+0x60/0xb4)
[03:31_20:53:06][] (__handle_domain_irq) from [] (gic_handle_irq+0x48/0x8c)
[03:31_20:53:06][] (gic_handle_irq) from [] (__irq_svc+0x6c/0x90)
[03:31_20:53:06]Exception stack(0xc0a01f60 to 0xc0a01fa8)
[03:31_20:53:06]1f60: 00000000 0011560a ef7d22e8 c0115ce0 c0a00000 c0a02fe4 00000001 c0a03038
[03:31_20:53:06]1f80: c0a0cce2 c09308c8 ef7ec7c0 00000000 60000013 c0a01fb0 c0108f00 c0108f04
[03:31_20:53:06]1fa0: 60000013 ffffffff
[03:31_20:53:06][] (__irq_svc) from [] (arch_cpu_idle+0x38/0x3c)
[03:31_20:53:06][] (arch_cpu_idle) from [] (cpu_startup_entry+0xbc/0x130)
[03:31_20:53:06][] (cpu_startup_entry) from [] (start_kernel+0x444/0x478)
[03:31_20:53:06]Code: ebf269a8 e5141bd4 e59130a8 e59120a4 (e5930000)
[03:31_20:53:06]---[ end trace 40776f222d4588a8 ]---
[03:31_20:53:06]Kernel panic - not syncing: Fatal exception in interrupt
[03:31_20:53:07]SMP: failed to stop secondary CPUs
[03:31_20:53:07]---[ end Kernel panic - not syncing: Fatal exception in interrupt
问题分析:
oops,基本原因是内核态有非法地址访问,导致异常。于是排查代码,没有发现异常,但是有个报错很奇怪:
每次都能看到这个 文件系统的错误:
ext4_mb_generate_buddy:758:group 3, block bitmap and bg descriptor inconsistent:4928 vs 4933 free clusters,还有一些invalid block bitmap的错误。一时也搞不懂这个是什么意思。网上找了些资料说,文件系统里面描述的文件族和真实的族不一样,文件系统损坏之类的。
于是重点关注内存这块。把top信息调出来,看到 cache的内存不停的增长,以为有内存泄漏,最后导致挂了。但是如果是内存泄漏,最后应该oom的,但是现在是oops。监视了句柄也没有泄漏。于是就想能不能限制 cache的使用,让缓存不这么大,系统自动触发缓存回收机制。
于是:echo xxx > /proc/sys/vm/min_free_kbytes,限制cache,当系统内存低于配置的值后,会触发内核线程,自动回收cache的内存。
这一条路果然有效果,当设置预留400M空间,系统不会挂了。 这个问题就很奇怪了,为什么内存大了,充足了,反而系统会挂呢?
当给系统预留400M的空间,就一直没有挂了。于是又分析了,当没有设置足够的预留空间时,linux系统会把所有的空间用来做cache使用,不停使用内存,直到free的内存足够小。而且cache使用到一定量的时候,系统挂了。于是怀疑是内存分配的问题。内存划分的时候1G给linux内存,1G给了MMZ空间。现在linux的1G不能完全使用,按照这个思路,写了一个测试程序,不停的分配1M空间,结果,分配到800多的时候,系统挂了。。。
于是问题就是出现在系统内存分配上。。。
问题描述:
设备跑一个小时后,不停录像,系统宕机,oops报错,必现。
SAMPLE_COMM_VENC_GetVencStreamProc]-1700: create venc file suc:stream_chn1_60.h264
[03:31_20:53:06]Unable to handle kernel paging request at virtual address 66f4adff
[03:31_20:53:06]pgd = c0004000
[03:31_20:53:06][66f4adff] *pgd=00000000
[03:31_20:53:06]Internal error: Oops: 5 [#1] SMP ARM
[03:31_20:53:06]Modules linked in: media(O) pbs(O) pcs(O) osa(O) hi_mipi_rx(O) hi3516cv500_acodec(PO) hi3516cv500_adec(PO) hi3516cv500_aenc(PO) hi3516cv500_ao(PO) hi3516cv500_ai(PO) hi3516cv500_aio(PO) hi3516cv500_hdmi(PO) hi_sensor_spi(O) hi_sensor_i2c(O) hi_piris(O) hi_pwm(O) hi3516cv500_nnie(PO) hi3516cv500_ive(PO) hi3516cv500_vdec(PO) hi3516cv500_vfmw(PO) hi3516cv500_jpegd(PO) hi3516cv500_jpege(PO) hi3516cv500_h265e(PO) hi3516cv500_h264e(PO) hi3516cv500_venc(PO) hi3516cv500_rc(PO) hi3516cv500_vedu(PO) hi3516cv500_chnl(PO) hifb(O) hi3516cv500_vo(PO) hi3516cv500_vpss(PO) hi3516cv500_isp(PO) hi3516cv500_vi(PO) hi3516cv500_dis(PO) hi3516cv500_vgs(PO) hi3516cv500_gdc(PO) hi3516cv500_rgn(PO) hi3516cv500_tde(PO) hi3516cv500_sys(PO) hi3516cv500_base(PO) hi_osal(O) sys_config(O)
[03:31_20:53:06]CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O 4.9.37 #13
[03:31_20:53:06]Hardware name: Generic DT based system
[03:31_20:53:06]task: c0a06200 task.stack: c0a00000
[03:31_20:53:06]PC is at h264e_get_stream_buf_stat+0x58/0x140 [hi3516cv500_h264e]
[03:31_20:53:06]LR is at osal_spin_lock_irqsave+0x10/0x18 [hi_osal]
[03:31_20:53:06]pc : [
[03:31_20:53:06]sp : c0a01cf8 ip : 00000000 fp : bf355e18
[03:31_20:53:06]r10: 00000001 r9 : f2492fdc r8 : 00000000
[03:31_20:53:06]r7 : 00000001 r6 : ffffffff r5 : f2d49d8c r4 : f2d4a000
[03:31_20:53:06]r3 : 66f4adff r2 : fd448fcb r1 : f2f42040 r0 : a0000193
[03:31_20:53:06]Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
[03:31_20:53:06]Control: 10c5387d Table: ae25c06a DAC: 00000051
[03:31_20:53:06]Process swapper/0 (pid: 0, stack limit = 0xc0a00210)
[03:31_20:53:06]Stack: (0xc0a01cf8 to 0xc0a02000)
[03:31_20:53:06]1ce0: f249af88 000001f3
[03:31_20:53:06]1d00: a0000193 c0a02fc8 f2492fdc f2492f88 f249af88 bf34541c 00000002 00000002
[03:31_20:53:06]1d20: 00000007 00000000 a49c8f60 0000000b 02fb0975 00000000 3b9cbb30 00000000
[03:31_20:53:06]1d40: 00000032 00000000 bf249e80 00000000 32000000 60000193 f2a7f094 00000007
[03:31_20:53:06]1d60: f2a7f0cf 00000007 f2a7f109 00000007 00000000 00000000 00000000 c0a02fc8
[03:31_20:53:06]1d80: fffffff7 f2496178 00000001 bf2ee730 bf2ee65c 00000000 bf2ee648 bf2ee688
[03:31_20:53:06]1da0: 00000002 bf2e8f84 f0aea158 bf2ed738 00000000 00000000 bf2ed904 bf02c950
[03:31_20:53:06]1dc0: f0aea158 bf205744 00000000 c0a02fc8 a0000193 bf2ee6b8 bf02c950 00000000
[03:31_20:53:06]1de0: 00000000 bf2ee640 bf2ee730 00000000 bf2ee648 bf2e9424 c0a01e8c ef7d2358
[03:31_20:53:06]1e00: 00000001 c01762b4 2615f9dc 00000000 0db89d7b 00000002 00000000 c0a02fc8
[03:31_20:53:06]1e20: c0a01e4c bf2ee640 bf2ee6b8 bf2ee6b0 00000200 00000000 c0a01e8c ef7d2358
[03:31_20:53:06]1e40: 00000001 bf2e993c 0003832a 2615f9dc a0000113 c0a02fc8 ffffe000 00000100
[03:31_20:53:06]1e60: bf2e9884 c016de80 c0a0b780 ef7d4ac0 ef7d2300 00000000 c0a02d00 c016e09c
[03:31_20:53:06]1e80: c093c300 2ee96000 00000000 00000000 0000009e 00000000 00000000 ef7ec7c0
[03:31_20:53:06]1ea0: 00000000 c017c9b0 34c5df43 c0a02fc8 ef00f8c0 00000020 00000001 c0a00000
[03:31_20:53:06]1ec0: c0a02084 c0a02080 00000100 c0a02080 40000001 c011de6c 00000000 f0803000
[03:31_20:53:06]1ee0: c0a01ed8 c0a36500 0000000a 0000943e c0a02d00 00200100 f0803000 c093d130
[03:31_20:53:06]1f00: 00000000 00000000 00000001 ef008000 f0803000 ef7ec7c0 00000000 c011e27c
[03:31_20:53:06]1f20: c093d130 c015df00 c0a137c0 c0a032a0 f080200c c0a01f60 f0802000 c01014c4
[03:31_20:53:06]1f40: c0108f04 60000013 ffffffff c0a01f94 c0a0cce2 c0a00000 ef7ec7c0 c010cb8c
[03:31_20:53:06]1f60: 00000000 0011560a ef7d22e8 c0115ce0 c0a00000 c0a02fe4 00000001 c0a03038
[03:31_20:53:06]1f80: c0a0cce2 c09308c8 ef7ec7c0 00000000 60000013 c0a01fb0 c0108f00 c0108f04
[03:31_20:53:06]1fa0: 60000013 ffffffff 00000051 00000000 c0a00000 c0154cb8 ffffffff c0900d50
[03:31_20:53:06]1fc0: 00000000 ffffffff 00000000 c09006b4 c09308c8 c0a02fc8 c0a35eb0 c0a02fdc
[03:31_20:53:06]1fe0: c09308c4 c0a072f0 8000406a 410fc075 00000000 8000807c 00000000 00000000
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06]Exception stack(0xc0a01f60 to 0xc0a01fa8)
[03:31_20:53:06]1f60: 00000000 0011560a ef7d22e8 c0115ce0 c0a00000 c0a02fe4 00000001 c0a03038
[03:31_20:53:06]1f80: c0a0cce2 c09308c8 ef7ec7c0 00000000 60000013 c0a01fb0 c0108f00 c0108f04
[03:31_20:53:06]1fa0: 60000013 ffffffff
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06][
[03:31_20:53:06]Code: ebf269a8 e5141bd4 e59130a8 e59120a4 (e5930000)
[03:31_20:53:06]---[ end trace 40776f222d4588a8 ]---
[03:31_20:53:06]Kernel panic - not syncing: Fatal exception in interrupt
[03:31_20:53:07]SMP: failed to stop secondary CPUs
[03:31_20:53:07]---[ end Kernel panic - not syncing: Fatal exception in interrupt
问题分析:
oops,基本原因是内核态有非法地址访问,导致异常。于是排查代码,没有发现异常,但是有个报错很奇怪:
每次都能看到这个 文件系统的错误:
ext4_mb_generate_buddy:758:group 3, block bitmap and bg descriptor inconsistent:4928 vs 4933 free clusters,还有一些invalid block bitmap的错误。一时也搞不懂这个是什么意思。网上找了些资料说,文件系统里面描述的文件族和真实的族不一样,文件系统损坏之类的。
于是重点关注内存这块。把top信息调出来,看到 cache的内存不停的增长,以为有内存泄漏,最后导致挂了。但是如果是内存泄漏,最后应该oom的,但是现在是oops。监视了句柄也没有泄漏。于是就想能不能限制 cache的使用,让缓存不这么大,系统自动触发缓存回收机制。
于是:echo xxx > /proc/sys/vm/min_free_kbytes,限制cache,当系统内存低于配置的值后,会触发内核线程,自动回收cache的内存。
这一条路果然有效果,当设置预留400M空间,系统不会挂了。 这个问题就很奇怪了,为什么内存大了,充足了,反而系统会挂呢?
当给系统预留400M的空间,就一直没有挂了。于是又分析了,当没有设置足够的预留空间时,linux系统会把所有的空间用来做cache使用,不停使用内存,直到free的内存足够小。而且cache使用到一定量的时候,系统挂了。于是怀疑是内存分配的问题。内存划分的时候1G给linux内存,1G给了MMZ空间。现在linux的1G不能完全使用,按照这个思路,写了一个测试程序,不停的分配1M空间,结果,分配到800多的时候,系统挂了。。。
于是问题就是出现在系统内存分配上。。。
我来回答
回答1个
时间排序
认可量排序
认可0
或将文件直接拖到这里
悬赏:
E币
网盘
* 网盘链接:
* 提取码:
悬赏:
E币
Markdown 语法
- 加粗**内容**
- 斜体*内容*
- 删除线~~内容~~
- 引用> 引用内容
- 代码`代码`
- 代码块```编程语言↵代码```
- 链接[链接标题](url)
- 无序列表- 内容
- 有序列表1. 内容
- 缩进内容
- 图片![alt](url)
相关问答
-
2016-08-12 09:02:28
-
22013-09-13 16:27:26
-
22013-09-11 19:53:07
-
12017-12-31 01:52:58
-
12013-11-03 19:46:05
-
2017-07-26 16:50:36
-
2019-12-17 16:31:52
-
202016-01-15 22:42:17
-
2020-03-27 16:04:54
-
2018-10-09 13:25:19
-
2019-10-11 16:25:03
-
2019-03-15 11:35:34
-
2017-10-11 10:33:31
-
2016-01-28 00:56:15
-
2017-12-14 11:29:51
-
2018-12-10 16:25:55
-
2015-01-15 22:34:47
-
2019-01-14 09:16:13
-
2020-11-21 17:32:56
无更多相似问答 去提问
点击登录
-- 积分
-- E币
提问
—
收益
—
被采纳
—
我要提问
切换马甲
上一页
下一页
悬赏问答
-
50如何获取vpss chn的图像修改后发送至vo
-
5FPGA通过Bt1120传YUV422数据过来,vi接收不到数据——3516dv500
-
50SS928 运行PQtools 拼接 推到设备里有一半画面会异常
-
53536AV100的sample_vdec输出到CVBS显示
-
10海思板子mpp怎么在vi阶段改变视频数据尺寸
-
10HI3559AV100 多摄像头同步模式
-
9海思ss928单路摄像头vio中加入opencv处理并显示
-
10EB-RV1126-BC-191板子运行自己编码的程序
-
10求HI3519DV500_SDK_V2.0.1.1
-
5有偿求HI3516DV500 + OV5647驱动
举报反馈
举报类型
- 内容涉黄/赌/毒
- 内容侵权/抄袭
- 政治相关
- 涉嫌广告
- 侮辱谩骂
- 其他
详细说明
提醒
你的问题还没有最佳答案,是否结题,结题后将扣除20%的悬赏金
取消
确认
提醒
你的问题还没有最佳答案,是否结题,结题后将根据回答情况扣除相应悬赏金(1回答=1E币)
取消
确认