I am learning amd64 assembler, and trying to implement a simple Unix filter. For an unknown reason, even simplified to the bare minimum version (code below), it crashes at r
sub rsp, <size>
to reserve stack space before touching it, if you're using more than 128 bytes below RSP.
When it crashes, look at your process memory map. You might be using memory so far below RSP that the kernel doesn't grow the stack mapping and thus it's just an ordinary access to an unmapped page = invalid page fault => kernel delivers SIGSEGV.
(The ABI only defines a 128-byte red-zone, but in practice the only thing that can clobber that memory is a signal handler (which you didn't install) or GDB running print some_func()
using your program's stack to call a function in your program.)
Normally Linux is pretty willing to grow the stack mapping without touching intervening pages, but apparently does check the value of RSP. Normally you move RSP instead of just using memory far below the stack pointer (because there's no guarantee it's safe). See How is Stack memory allocated when using 'push' or 'sub' x86 instructions?
Another duplicate: Which exception can be generated when subtracting ESP or RSP register? (stack growing) where using sub rsp, 5555555
before touching new stack memory was sufficient.
Stack ASLR might start RSP in different places relative to a page boundary, so you might be just barely getting away with it sometimes. Linux initially maps 132kiB of stack space, and that includes space for the environment and args on the stack on entry to _start
. Your 128kiB is very close to that, so it's totally plausible that it randomly works sometimes.
And BTW, there's zero reason to actually copy memory in user-space, especially not 1 byte at a time. Just pass the same address to write
.
Or at least filter in-place if possible, so your cache footprint is smaller.
Also, the normal way to load a byte is movzx eax, byte [mem]
. Only use mov al, [mem]
if you specifically want to merge with the old value of RAX. On some CPUs, mov
to al
has a false dependency on the old value which you can break by writing the full register.
And BTW, if your program always uses this space, you might as well statically allocate it in the BSS. That makes more efficient indexed addressing possible if you choose to assemble a position-dependent (non-PIE) executable.
The red zone in amd64 is only 128 bytes long, but you're using 131072 bytes below rsp. Move the stack pointer down to encompass the buffers that you want to store on the stack.