问题
I have written the following C code:
It simply allocates an array of 1000000 integers and another integer, and sets the first integer of the array to 0
I compiled this using gcc -g test.c -o test -fno-stack-protector
It gives a very weird disassembly:
Apparently it keeps allocating 4096 bytes on the stack in a loop, and "or"s every 4096th byte with 0 and then once it reaches 3997696 bytes, it then further allocates 2184 bytes. It then proceeds to set the 4000000th byte (which was never allocated) to 5.
Why doesn't it allocate the full 4000004 bytes that were requested? Why does it "or" every 4096th byte with 0, which is a useless instruction?
Am I understanding something wrong here?
NOTE: This was compiled with gcc version 9.3. gcc version 7.4 does not do the loop and "or" every 4096th byte with 0, but it does allocate only 3997696+2184=3999880 bytes but still sets the 4000000th byte to 5
回答1:
This is a mitigation for the Stack Clash class of vulnerabilities, known since the 90s or earlier but only widely publicized in 2017. (See stack-clash.txt and this blog entry.)
If the attacker can arrange for a function with a VLA of attacker-controlled size to execute, or can arrange for a function with a large fixed-size array to execute when the attacker controls the amount of stack already used in some other way, they can cause the stack pointer to be adjusted to point into the middle of other memory, and thereby cause the function to clobber said memory, usually leading to arbitrary code execution.
The machine code GCC has emitted here is part of the Stack Clash Protection feature. It mitigates the risk by (roughly), whenever adjusting the stack pointer by more than the minimum page size, moving it incrementally by one minimum-page-sized unit at a time and accessing memory after each adjustment. This ensures that, if at least one guard page (page mapped PROT_NONE
) is present, the access will fault and generate a signal before the adjustment into unrelated memory is made. The main thread always has guard pages, and by default newly created ones do too (and the size can be configured in the pthread thread creation attributes).
回答2:
There are two things here:
the "no-op" ORs read and write to each page on stack. These are required because the stack is usually mapped so that there is a guard page/pages below the stack. When the guard page is touched the stack is expanded down. But if you touch the memory below the guard page a SIGSEGV would happen.
the x86-64 System-V ABI specifies a 128-byte red zone below the stack pointer. This area can be freely used by the compiler to store local variables too. If you add 128 to 3997696 you'll get 4000008. Note that the stack will always have to be at least aligned to 8, not 4, so that any int64_t or double would be aligned (as noted by Peter Cordes, larger arrays need to be 16-byte-aligned, hence the requirement for the entire stack to be 16-byte aligned too), so 40000004 would be plain wrong!
来源:https://stackoverflow.com/questions/63402116/the-x86-disassembly-for-c-code-generates-orq-0x0-rsp