问题
I have the following stack trace and crash information after the Linux kernel failed to load:
[ 3.684670] ------------[ cut here ]------------
[ 3.695507] Bad FPU state detected at fpu__clear+0x91/0xc2, reinitializing FPU registers.
[ 3.695508] traps: No user code available.
[ 3.704745] invalid opcode: 0000 [#1] PREEMPT
[ 3.715304] CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.50-android-x86-geeb7e76-dirty #1
[ 3.724594] Hardware name: AAEON UP-APL01/UP-APL01, BIOS UPA1AM21 09/01/2017
[ 3.732622] EIP: ex_handler_fprestore+0x2e/0x65
[ 3.737807] Code: 00 55 89 e5 57 8b 48 04 8d 44 08 04 89 42 30 80 3d e7 fb a0 c1 00 75 16 c6 05 e7 fb a0 c1 01 50 68 b4 38 87 c1 e8 98 ba 00 00 <0f> 0b 58 5a 90 8d 74 26 00 eb f
[ 3.759027] EAX: 0000004d EBX: c103d6f9 ECX: c19a2a48 EDX: c19a2a48
[ 3.766169] ESI: df4c7e04 EDI: 00000006 EBP: df4c7c6c ESP: df4c7c60
[ 3.773316] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010292
[ 3.781044] CR0: 80050033 CR2: c168c6b4 CR3: 1e902000 CR4: 001406d0
[ 3.788184] Call Trace:
[ 3.791026] ? fpu__clear+0x91/0xc2
[ 3.795037] fixup_exception+0x61/0x6e
[ 3.799348] do_trap+0x35/0xe9
[ 3.802864] do_invalid_op+0xd9f/0x108a
[ 3.807269] ? atime_needs_update+0x68/0xf5
[ 3.812058] ? touch_atime+0x37/0xbd
[ 3.816168] ? __check_object_size+0x83/0x123
[ 3.821153] ? fpu__clear+0x8e/0xc2
[ 3.825166] ? generic_file_read_iter+0x28d/0x723
[ 3.830544] ? generic_file_read_iter+0x28d/0x723
[ 3.835931] ? __vfs_read+0xe9/0x11f
[ 3.840043] common_exception+0x105/0x10e
[ 3.844634] EIP: fpu__clear+0x91/0xc2
[ 3.848840] Code: eb 05 e8 b4 f2 fd ff ff 0d 98 a8 99 c1 74 3b 90 8d 74 26 00 eb 07 90 8d 74 26 00 eb 1c 83 c8 ff bf c0 8c a2 c1 89 c2 0f c7 1f <a1> f4 8b a2 c1 ff 0d 98 a8 99 1
[ 3.870070] EAX: ffffffff EBX: df4c5900 ECX: 00000000 EDX: ffffffff
[ 3.877210] ESI: df4c5900 EDI: c1a28cc0 EBP: df4c7e4c ESP: df4c7e40
[ 3.884356] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010286
[ 3.892085] ? do_alignment_check+0x1a/0x1a
[ 3.896878] ? common_exception+0x105/0x10e
[ 3.901674] flush_thread+0x33/0x37
[ 3.905684] flush_old_exec+0x540/0x5f9
[ 3.910085] load_elf_binary+0x24b/0xec1
[ 3.914584] ? pick_next_task_fair+0xdf/0x13a
[ 3.919575] ? __schedule+0x4bb/0x63f
[ 3.923780] ? sched_debug_header+0x45/0x40a
[ 3.928666] ? preempt_schedule+0x2d/0x3c
[ 3.933266] search_binary_handler+0x89/0x1ac
[ 3.938259] load_script+0x184/0x19f
[ 3.942366] search_binary_handler+0x89/0x1ac
[ 3.947354] __do_execve_file+0x454/0x668
[ 3.951954] do_execve+0x1b/0x1d
[ 3.955673] run_init_process+0x31/0x36
[ 3.960082] ? rest_init+0x99/0x99
[ 3.963992] kernel_init+0x5e/0xdf
[ 3.967905] ret_from_fork+0x19/0x30
[ 3.972014] Modules linked in:
[ 3.975542] ---[ end trace 7d27fceeb3852a38 ]---
[ 3.980823] EIP: ex_handler_fprestore+0x2e/0x65
[ 3.986014] Code: 00 55 89 e5 57 8b 48 04 8d 44 08 04 89 42 30 80 3d e7 fb a0 c1 00 75 16 c6 05 e7 fb a0 c1 01 50 68 b4 38 87 c1 e8 98 ba 00 00 <0f> 0b 58 5a 90 8d 74 26 00 eb f
[ 4.007247] EAX: 0000004d EBX: c103d6f9 ECX: c19a2a48 EDX: c19a2a48
[ 4.014387] ESI: df4c7e04 EDI: 00000006 EBP: df4c7c6c ESP: c1afa3b0
[ 4.021536] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010292
[ 4.029265] CR0: 80050033 CR2: c168c6b4 CR3: 1e902000 CR4: 001406d0
[ 4.036413] note: swapper[1] exited with preempt_count 1
What does the Code
mean? Also can I know the exact x86 instruction (not the C function) that caused the kernel to crash?
EDIT: Updated the code. I was trying to run Linux in a virtualized environment.
回答1:
Code
is a hexdump of x86 machine code (presumably 32-bit mode from a legacy 32-bit kernel since it only dumped 32-bit register contents).
The byte marked with <>
is where EIP is pointing, so it's the faulting instruction inside ex_handler_fprestore
Feed it to a disassembler, e.g. https://defuse.ca/online-x86-assembler.htm#disassembly2, or Linux's crashdump decoding script https://elixir.bootlin.com/linux/latest/source/scripts/decodecode
Remember that x86 machine code uses a variable-length encoding that can't be unambiguously decoded backwards. But this is compiler-generated code, so at least we can assume there aren't supposed to be overlapping instructions or static data mixed with code (because x86 has no benefit for that). If we find the start of a function in compiler-generated code, the rest of the instructions will all be "sane".
The 00
byte looks like part of a previous instruction or padding between functions: Decoding from there would give us add BYTE PTR [ebp-0x77],dl
which is plausible, in eax,0x57
after that isn't, for a non-driver function.
Much more likely is that the 0x89
byte is the opcode of a MOV instruction.
If we drop the 00
byte and start from 55
(which is push ebp
), we get a normal function body including the stack-frame setup prologue you'd expect if compiled with -Os
or -fno-omit-frame-pointer
.
In general, you can drop bytes one at a time until you get a sane-looking decoding that at least has an instruction-boundary on the faulting instruction. (But some experience is required for "sane-looking"; disassembly may have gotten in sync by chance after starting wrong. That's not rare for x86 machine code.)
# skipped the 00 byte which would desync decoding
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 57 push edi
4: 8b 48 04 mov ecx,DWORD PTR [eax+0x4] # EAX = 1st function arg, ECX = tmp
7: 8d 44 08 04 lea eax,[eax+ecx*1+0x4]
b: 89 42 30 mov DWORD PTR [edx+0x30],eax # EDX = 2rd function arg
e: 80 3d e7 fb a0 c1 00 cmp BYTE PTR ds:0xc1a0fbe7,0x0
15: 75 16 jne 0x2d
17: c6 05 e7 fb a0 c1 01 mov BYTE PTR ds:0xc1a0fbe7,0x1
1e: 50 push eax
1f: 68 b4 38 87 c1 push 0xc18738b4
24: e8 98 ba 00 00 call 0xbac1
29: 0f 0b ud2 ### <=== EIP points here
# stuff after this probably isn't real code; it's unreachable
2b: 58 pop eax
2c: 5a pop edx
2d: 90 nop
2e: 8d 74 26 00 lea esi,[esi+eiz*1+0x0]
32: eb .byte 0xeb
So this function really ends with a call to a noreturn
function with stack args. (32-bit x86 Linux kernels are built with -mregparm=3
so the first 3 args are in EAX, EDX, ECX in that order, so either this function is not regparm or it has more than 3 args. You can see this function uses EAX and EDX as incoming args: reading them before writing.)
But it's not a jmp
tailcall for some reason; maybe for exception backtracing it wants this function's stack frame on the stack. (Which might explain the push ebp
/ mov ebp,esp
even if this kernel was built with -fomit-frame-pointer
as part of -O2
.)
You'd have to look at the C source for ex_handler_fprestore
to guess why that might be.
ud2 is an illegal instruction. The compiler (or inline asm?) put it there so it would fault if the function returned. It's a clear sign that this path of execution is supposed to be unreachable, or is marked to intentionally trap as an assert()
type of mechanism. (In Linux, look for BUG_ON()
).
来源:https://stackoverflow.com/questions/57206372/what-is-code-in-linux-kernel-crash-messages