I trying to input four floats using scanf
, store them onto the stack, and then use vmovupd
to copy them over to a register for use. My problem is w
The problem is with your stack usage.
First, the ABI docs mandate rsp
be 16 byte aligned before a call
.
Since a call
will push an 8 byte return address on the stack, you need to adjust rsp
by a multiple of 16 plus 8 to get back to 16-byte alignment. The 16 * n + 8
is including any push
instructions or other changes to RSP, not just sub rsp, 24
. This is the immediate cause of the segfault, because printf
will use aligned SSE
instructions which will fault for unaligned addresses.
If you fix that, your stack is still unbalanced, because you keep pushing values but never pop them. It's hard to understand what you want to do with the stack.
The usual way is to allocate space for the locals in the beginning of your function (the prologue) and free this at the end (epilogue). As discussed above, this amount (including any pushes) should be a multiple of 16 plus 8 because RSP on function entry (after the caller's call
) is 8 bytes away from a 16-byte boundary.
In most builds of glibc, printf
will only care about 16-byte stack alignment when AL != 0. (Because that means there are FP args, so it dumps all the XMM registers to the stack so it can index them for %f
conversions.)
It's still a bug if you call it with a misaligned stack even if it happens to work on your system; a future glibc version could include code that depends on 16-byte stack alignment even without FP args. For example, scanf
already does crash on misaligned stacks even with AL=0
on most GNU/Linux distros.