H64 el1_entry 异常中断调试分析
1. log分析
[ 3537.282130] PC is at do_page_fault+0x40/0x2e0
[ 3537.282130] LR is at do_translation_fault+0x5c/0xd4
[ 3537.282130] pc : [<ffffffc000095704>] lr : [<ffffffc000095a00>] pstate: 800001c5
[ 3537.282130] sp : ffffffc027b38130
[ 3537.282130] x29: ffffffc027b38130 x28: ffffffc027bdc000
[ 3537.282130] x27: ffffffc0009fa0e9 x26: 0000000000000000
[ 3537.282130] x25: 0000000096000005 x24: 0000000000000025
[ 3537.282130] x23: 0000000000000000 x22: ffffffc027b38390
[ 3537.282130] x21: 00000000000002b0 x20: 00000000000002b0
[ 3537.282130] x19: ffffffc027b38390 x18: 000000000000001e
[ 3537.282130] x17: 00000000000101d0 x16: ffffffc0111dccf4
[ 3537.282130] x15: ffffffc0111dcc04 x14: 0000000000000003
[ 3537.282130] x13: 000000004437411e x12: ffffffc000822000
[ 3537.282130] x11: 0000000000000006 x10: 0000000000000007
[ 3537.282130] x9 : 000000000000000e x8 : 00125bbb859b6f00
[ 3537.282130] x7 : 0000000000000012 x6 : ffffffc000cf77d0
[ 3537.282130] x5 : ffffffc00079e3d8 x4 : ffffffc00079e3d8
[ 3537.282130] x3 : ffffffc0000959a4 x2 : ffffffc027b38390
[ 3537.282130] x1 : 0000000096000005 x0 : 00000000800001c5
do_page_fault入栈汇编:
ffffffc0000956c4 <do_page_fault>:
ffffffc0000956c4: a9a87bfd stp x29, x30, [sp,#-384]!
ffffffc0000956c8: 910003fd mov x29, sp
死机现场sp和x29
sp : ffffffc027b38130
x29: ffffffc027b38130
x30(lr):ffffffc000095a00
x30为上一级LR寄存器数据,x30入栈[sp-384+8]地址ffffffc027b38138内存中,内存地址ffffffc027b38138中数据为ffffffc000095a00,
经过SP的反向推断,sp中存放的lr数据与跑飞是的lr数据一致;说明CPU数据正常.
2. DS5级别分析
使用DS5依次按照cpu0-->cpu1-->cpu2-->cpu3进行连接,
依次DS5 stop掉在线的cpu,依次load vmlinux,
之后就可以查看所有的CPU栈信息
dump 出cpu current thread信息: info stack
从stack中可以看到,崩溃原因是cpu访问非法地址后触发了el1_sync异常中断,
中断处理过程中检查到触发中断的原因是data abort in EL1后,
跳入到do_mem_abort流程进行缺页异常处理,
do_page_fault阶段检测到该非法地址触发在内核空间,产生panic异常崩溃.
目前多次死机现场一致,且出现问题时,传入的addr参数比较随机,
目前怀疑32位用户空间往64位内核传递参数中指针出现异常,
导致cpu在内核空间访问该地址时出现异常.
目前主要困难是DS5只能抓取el1_sync异常中断后的崩溃流程,
至于Cpu异常中断前的CPU的SPSR和SP,还需要通过汇编进行推导.
el1_sync
kernel_entry el=1中sp = sp - (288-240)
sp = sp - (15*16)
x21寄存器 = sp + 288
x22寄存器 = el1 lr
x23寄存器 = el1 spsr
将lr寄存器入栈 [sp + 240] //LR
将x21寄存器值入栈[sp + 240 + 8]
将x22寄存器值入栈[sp + 256] //PC
将x23寄存器值入栈[sp + 256 +8]
el1_da:
x2 = sp
do_mem_abort:
x29寄存器入栈[sp-176]
x30寄存器入栈[sp-176+8]
sp = sp - 176
x29 = sp
do_translation_fault:
x29-->[sp-48]
x30-->[sp-48+8]
sp = sp - 48
do_page_fault:
x29-->[sp-384]
x30-->[sp-384+8]
sp = sp - 384
由于1~3节中的现场已经被破坏了,所以无法读取内存,重新复现到现象时抓取有效数据如下:
#0 arch_counter_get_cntvct() at arch_timer.h:153
#1 __delay(cycles = 24000) at delay.c:31
#2 __const_udelay(xloops = <Value currently has no location>) at delay.c:42
#3 panic(fmt = <Value currently has no location>) at panic.c:187
#4 die(str = <Value currently has no location>, regs = (struct pt_regs*) 0xFFFFFFC0297FC050, err = -1778384891) at traps.c:247
#5 __do_kernel_fault(mm = (struct mm_struct*) 0xFFFFFFC029BF8680, addr = 18446743833205608120, esr = 2516582405, regs = (struct pt_regs*) 0xFFFFFFC0297FC050) at fault.c:102
#6 do_translation_fault(addr = 18446743833205608120, esr = 2516582405, regs = (struct pt_regs*) 0xFFFFFFC0297FC050) at fault.c:362
#7 do_mem_abort(addr = 18446743833205608120, esr = 2516582405, regs = (struct pt_regs*) 0xFFFFFFC0297FC050) at fault.c:459
#8 [el1_sync+0xB0]
(1)#11:try_to_wake_up
在#10中,sp变化以及x30数据入栈操作如下:
#11-x29 --> #11-sp -80
#11-x30 --> #11-sp -80 +8
#10-sp = #11-sp -80 = 0xFFFFFFC0297FC170
#11-x29 = DS5抓取数据 0xFFFFFFC0297FC1C0
#11-x30 = DS5抓取数据 0xFFFFFFC0000CEC7C
#11-SP = 0xFFFFFFC0297FC1C0
LR = X30 = 0xFFFFFFC0000CEC7C
汇编代码为:
ffffffc0000cea74 <try_to_wake_up>:
... ...
... ...
ffffffc0000cec78: 97ffecd8 bl ffffffc0000c9fd8 <ttwu_stat>
-->ffffffc0000cec7c: 14000020 b ffffffc0000cecfc <try_to_wake_up+0x288>
... ...
... ...
x19 寄存器:0xFFFFFFC012DD3440
(2)#10: ttwu_stat
#10-cpsr = #9-spsr = 0x00000000800001C5
M[4:0] = 0b00101 AARCH64 EL1h系统异常模式 M[0]= 0b1 SP_EL1 作为SP
#10-sp = #9-sp = 0xFFFFFFC0297FC170
#9-lr 0xFFFFFFC0000CA014推导代码位置:
ffffffc0000c9fd8 <ttwu_stat>:
ffffffc0000c9fd8: a9bb7bfd stp x29, x30, [sp,#-80]!
ffffffc0000c9fdc: 910003fd mov x29, sp
ffffffc0000c9fe0: a90153f3 stp x19, x20, [sp,#16]
ffffffc0000c9fe4: a9025bf5 stp x21, x22, [sp,#32]
ffffffc0000c9fe8: a90363f7 stp x23, x24, [sp,#48]
ffffffc0000c9fec: f90023f9 str x25, [sp,#64]
ffffffc0000c9ff0: 90006656 adrp x22, ffffffc000d91000 <__key.22563>
ffffffc0000c9ff4: aa0003f3 mov x19, x0
ffffffc0000c9ff8: 9102e2d6 add x22, x22, #0xb8
ffffffc0000c9ffc: aa1e03e0 mov x0, x30
ffffffc0000ca000: 2a0103f8 mov w24, w1
ffffffc0000ca004: 2a0203f7 mov w23, w2
ffffffc0000ca008: 97ff185a bl ffffffc000090170 <_mcount>
ffffffc0000ca00c: b00054b5 adrp x21, ffffffc000b5f000 <cpu_worker_pools+0x440>
ffffffc0000ca010: 940a6f4c bl ffffffc000365d40 <debug_smp_processor_id>
--->ffffffc0000ca014: f8605ad4 ldr x20, [x22,w0,uxtw #3]
cpu 在EL1系统异常模式从el1_sync-->el1_da传入do_mem_abort的X0寄存器如下:
mrs X0, far_el1 //el1 FAR异常地址寄存器
X0中异常地址为0x199999940015B4AC,在日常测试时发现该地址数值非常随机;
目前怀疑cpu执行指令ldr x20, [x22,w0,uxtw #3]期间,访问寄存器地址时出现异常,异常中断产生后,lr指向当前触发异常的指令
1).排查x22寄存器数据:
#10中栈保存的上一级x22保存在栈[0xFFFFFFC0297FC170+32+8]=[0xFFFFFFC0297FC198]中,DS5抓取数据为x22:0x00000000 00000000
#9中栈保存的X22寄存器经过DS5抓取数据为:0xFFFFFFC000d910b8,
先使用#10栈中保存的x22数据结合代码进行推算:
adrp x22, ffffffc000d91000 <__key.22563> //计算得到x22 = ffffffc000d91000
add x22, x22, #0xb8 //计算得到x22 = ffffffc000d910b8
推算后x22的数据为ffffffc000d910b8,该数据与#9-x22中保存的数据一致.
2).排查x22,w0,uxtw #3
x22 = 0xffffffc000d910b8
w0 = ((unsigned long)w0)<<3
x22 + w0 = 0x199999940015B4AC ?
反推: w0 = 0x199999D3FF3CA3F4 ?
w0>>3 = 0x333333A7FE7947E
#8:el1_sync 下CPU寄存器状态数据:
PC 0xFFFFFFC000083C30
SP 0xFFFFFFC0297FC050
W0 0x00001317 //数据异常
W1 0xCBD6EEA0
W2 0x0000000C
W3 0xCBD701B6
W4 0x00000001
W5 0x0035EEBC
W6 0x00CD2E21
W7 0x2064656C
W8 0x20706F74
W9 0x7F7F7F7F
W10 0xFEFEFEFF
W11 0x7F7F7F7F
W12 0x01010101
W13 0x00000038
W14 0xFFFFFFFE
W15 0x00000000
W16 0x001E1B30
W17 0x00000000
W18 0x00000000
W19 0x00005DC0
W20 0x001DC004
W21 0x00000001
W22 0x001DC068
W23 0x00000056
W24 0x96000005
W25 0x00D91000
W26 0x00B5F000
W27 0x009FA0E9
W28 0x297FC000
W29 0x297FBE10
W30 0x00353AEC
(3)EL1 Mode中保存的的栈数据从#8数据结合el1_sync代码流程推导
el1 模式中:
/#9-sp = #8-sp+(15*16)+(288-240)=#8-sp+288= 0xFFFFFFC0297FC170
从代码中推出:#8-x21 = #8-sp + 288 ,现场#8-x21=0xFFFFFFC0297FC170,代码推导与现场cpu数据一致;
且代码推导#9-sp 数据和现场cpu状态数据一致,#9-sp正确。
/#9-lr = [#8-sp+240]=[0xFFFFFFC0297FC050+240]= [0xFFFFFFC0297FC140] = (DS5 dump memory) 0xFFFFFFC0000CA014
/#9-el1 lr = [#8-sp+256]=[0xFFFFFFC0297FC050+256]=[0xFFFFFFC0297FC150] = (DS5 dump memory)0xFFFFFFC0000CA014
代码中:x22寄存器 = el1 lr ,现场#9-x22寄存器 = 0xFFFFFFC0000CA014,与el1 lr数据一致;
/#9-spsr = [0xFFFFFFC0297FC050+256+8]= [0xFFFFFFC0297FC158] = (DS5 dump memory)0x00000000800001C5
从代码中推出:x23寄存器 = el1 spsr,现场x23=0x00000000800001C5,代码推导数据和cpu状态数据正确;
EL1h Mode阶段在kernel_entry中保存了异常中断前系统的X0~X29寄存器
汇编代码:
sp = sp - (288-240)// = 0xFFFFFFC0297FC140
push x28, x29 // stp \xreg1,\xreg2,[sp,#-16]!
push x26, x27
push x24, x25
push x22, x23
push x20, x21
push x18, x19
push x16, x17
push x14, x15
push x12, x13
push x10, x11
push x8, x9
push x6, x7
push x4, x5
push x2, x3
push x0, x1
DS5抓取栈[#9-sp -48] ~ [#9-sp -48 -240]地址内存数据:
sp[0xFFFFFFC0297FC140] ~ [0xFFFFFFC0297FC050]
结合kernel_entry汇编反推栈中寄存器分布:
X28-->[sp-16]:0xFFFFFFC0297FC130 = 0xFFFFFFC0297FC000
X29-->[sp-8]:0xFFFFFFC0297FC138 = 0xFFFFFFC0297FC170
得到的寄存器数据分布如下:
EL1N:0xFFFFFFC0297FC050: X0 0x00000000FFFFFFC0 X1 0x0000000000000000 X2 0x0000000000000000 X3 0x0000000000000200 X4 0x0000000000000000 X5 0x0000000000000044 X6 0xFFFFFFC000CDB33C
EL1N:0xFFFFFFC0297FC088: X7 0x0000000000000000 X8 0xFFFFFFC000CDB33C X9 0x7F7F7F7F7F7F7F7F X10 0x67531F534F4C4444 X11 0x7F7F7F7F7F7F7F7F X12 0x0101010101010101 X13 0x0000000000000028
EL1N:0xFFFFFFC0297FC0C0: X14 0xFFFFFFFFFFFFFFFF X15 0x0000000000000000 X16 0xFFFFFFC0001E1B30 X17 0x0000000000000000 X18 0x0000000000000000 X19 0xFFFFFFC012DD3440 X20 0x0000000000000001
EL1N:0xFFFFFFC0297FC0F8: X21 0xFFFFFFC000B5F000 X22 0xFFFFFFC000D910B8 X23 0x0000000000000000 X24 0x0000000000000000 X25 0xFFFFFFC000D91000 X26 0xFFFFFFC000B5F000 X27 0xFFFFFFC0009FA0E9
EL1N:0xFFFFFFC0297FC130: X28 0xFFFFFFC0297FC000 X29 0xFFFFFFC0297FC170
(4)
el1_sync-->kernel_entry
el=1
sp = sp - (288-240)
sp = sp - (15*16) //将X0-X29寄存器入栈
x21寄存器 = sp + 288
x22寄存器 = el1 lr
x23寄存器 = el1 spsr
将lr寄存器入栈 [sp + 240] //LR
将x21寄存器值入栈[sp + 240 + 8]
将x22寄存器值入栈[sp + 256] //PC
将x23寄存器值入栈[sp + 256 +8]
el1_da:
x2 = sp
现场栈数据:
X19 0xFFFFFFC012DD3440
X20 0x0000000000000001
X21 0xFFFFFFC0297FC170
X22 0xFFFFFFC0000CA014
X23 0x00000000800001C5
X24 0x0000000000000025
X25 0xFFFFFFC000D91000
X26 0xFFFFFFC000B5F000
X27 0xFFFFFFC0009FA0E9
X28 0xFFFFFFC0297FC000
X29 0xFFFFFFC0297FC170
PC 0xFFFFFFC000083C30
SP 0xFFFFFFC0297FC050
代码推导:#8-sp = #7-sp + 176 = 0xFFFFFFC0297FBFA0 + 176 = 0xFFFFFFC0297FC050
从#7-sp 反推到#8-sp 的理论值和现场cpu SP栈数据一致,且#8-sp 与 #8-X21 -一致,数据正常.
(5)
data_bad-->do_mem_abort
x29寄存器入栈[sp-176]
x30寄存器入栈[sp-176+8]
sp = sp - 176
x29 = sp
现场栈数据:
X19 0x0000000096000005
X20 0xFFFFFFC800D90EB8
X21 0xFFFFFFC000B70E90
X22 0xFFFFFFC0297FC050
X23 0x00000000800001C5
X24 0x0000000000000025
X25 0xFFFFFFC000D91000
X26 0xFFFFFFC000B5F000
X27 0xFFFFFFC0009FA0E9
X28 0xFFFFFFC0297FC000
X29 0xFFFFFFC0297FBFA0
PC 0xFFFFFFC000081238
SP 0xFFFFFFC0297FBFA0
代码推导:#7-sp = #6-sp + 48 = 0xFFFFFFC0297FBF70 +48 = 0xFFFFFFC0297FBFA0
从#6-sp 推导出的#7-sp 的理论值 和 现场cpu SP栈数据一致,且#7-sp 与 #7-X29一致,数据正常.
(6)
do_mem_abort-->do_translation_fault
x29-->[sp-48]
x30-->[sp-48+8]
sp = sp - 48
x29 = sp
现场栈数据:
X19 0xFFFFFFC0297FC050
X20 0xFFFFFFC800D90EB8
X21 0x0000000096000005
X22 0xFFFFFFC029BF8680
X23 0x00000000800001C5
X24 0x0000000000000025
X25 0xFFFFFFC000D91000
X26 0xFFFFFFC000B5F000
X27 0xFFFFFFC0009FA0E9
X28 0xFFFFFFC0297FC000
X29 0xFFFFFFC0297FBF70
PC 0xFFFFFFC000095A64
SP 0xFFFFFFC0297FBF70
结论:#5-sp + 48 = 0xFFFFFFC0297FBF40 +48 = 0xFFFFFFC0297FBF70
从#5-sp 反推到#6-sp 的理论值和现场栈数据一致,且#6-sp 与 #6-X29一致,数据正常.
- 分享
- 举报
-
浏览量:578次2024-01-25 13:00:44
-
浏览量:694次2024-01-25 13:17:04
-
浏览量:1295次2023-12-04 13:11:50
-
浏览量:4723次2021-04-01 15:39:46
-
浏览量:6225次2021-07-09 15:17:28
-
浏览量:8734次2020-09-08 19:26:12
-
浏览量:6213次2021-04-20 16:37:57
-
浏览量:5759次2021-03-30 14:44:45
-
浏览量:4644次2021-03-30 14:17:51
-
浏览量:7033次2020-09-10 09:46:52
-
浏览量:1943次2019-07-18 16:04:29
-
浏览量:2648次2017-09-28 11:37:40
-
浏览量:8416次2020-11-26 14:22:19
-
浏览量:1052次2023-12-16 15:54:44
-
浏览量:4382次2021-07-14 17:02:38
-
浏览量:14511次2021-01-16 15:43:02
-
浏览量:6917次2020-11-26 17:02:47
-
浏览量:507次2024-08-27 10:56:56
-
浏览量:685次2023-12-21 18:17:30
-
广告/SPAM
-
恶意灌水
-
违规内容
-
文不对题
-
重复发帖
free-jdx
感谢您的打赏,如若您也想被打赏,可前往 发表专栏 哦~
举报类型
- 内容涉黄/赌/毒
- 内容侵权/抄袭
- 政治相关
- 涉嫌广告
- 侮辱谩骂
- 其他
详细说明