Heyo,
I have written this very basic main function to experiment with disassembly and also to see and hopefully understand what is going on at the lower level:
The code at the beginning of the function body:
push %ebp
mov %esp, %ebp
is to create the so-called stack frame, which is a "solid ground" for referencing parameters and objects local to the procedure. The %ebp
register is used (as its name indicates) as a base pointer, which points to the base (or bottom) of the local stack inside the procedure.
After entering the procedure, the stack pointer register (%esp
) points to the return address stored on the stack by the call instruction (it is the address of the instruction just after the call). If you'd just invoke ret
now, this address would be popped from the stack into the %eip
(instruction pointer) and the code would execute further from that address (of the next instruction after the call
). But we don't return yet, do we? ;-)
You then push %ebp
register to save its previous value somewhere and not lose it, because you'll use it for something shortly. (BTW, it usually contains the base pointer of the caller function, and when you peek that value, you'll find a previously stored %ebp
, which would be again a base pointer of the function one level higher, so you can trace the call stack that way.) When you save the %ebp
, you can then store the current %esp
(stack pointer) there, so that %ebp
will point to the same address: the base of the current local stack. The %esp
will move back and forth inside the procedure when you'll be pushing and popping values on the stack or reserving & freeing local variables. But %ebp
will stay fixed, still pointing to the base of the local stack frame.
Parameters passed to the procedure by the caller are "burried just uner the ground" (that is, they have positive offsets relative to the base, because stack grows down). You have in %ebp
the address of the base of the local stack, where lies the previous value of the %ebp
. Below it (that is, at 4(%ebp)
lies the return address. So the first parameter will be at 8(%ebp)
, the second at 12(%ebp)
and so on.
And local variables could be allocated on the stack above the base (that is, they'd have negative offsets relative to the base). Just subtract N to the %esp
and you've just allocated N
bytes on the stack for local variables, by moving the top of the stack above (or, precisely, below) this region :-) You can refer to this area by negative offsets relative to %ebp
, i.e. -4(%ebp)
is the first word, -8(%ebp)
is second etc. Remember that (%ebp)
points to the base of the local stack, where the previous %ebp
value has been saved. So remember to restore the stack to the previous position before you try to restore the %ebp
through pop %ebp
at the end of the procedure. You can do it two ways:
1. You can free only the local variables by adding back the N
to the %esp
(stack pointer), that is, moving the top of the stack as if these local variables had never been there. (Well, their values will stay on the stack, but they'll be considered "freed" and could be overwritten by subsequent pushes, so it's no longer safe to refer them. They're dead bodies ;-J )
2. You can flush the stack down to the ground and free all local space by simply restoring the %esp
from the %ebp
which has been fixed earlier to the base of the stack. It'll restore the stack pointer to the state it has just after entering the procedure and saving the %esp
into %ebp
. It's like loading the previously saved game when you've messed something ;-)
It's possible to have a less messy assembly from gcc -S
by adding a switch -fomit-frame-pointer
. It tells GCC to not assemble any code for setting/resetting the stack frame until it's really needed for something. Just remember that it can confuse debuggers, because they usually depend on the stack frame being there to be able to track up the call stack. But it won't break anything if you don't need to debug this binary. It's perfectly fine for release targets and it saves some spacetime.
Sometimes you can meet some strange assembler directives starting from .cfi
interleaved with the function header. This is a so-called Call Frame Information. It's used by debuggers to track the function calls. But it's also used for exception handling in high-level languages, which needs stack unwinding and other call-stack-based manipulations. You can turn it off too in your assembly, by adding a switch -fno-dwarf2-cfi-asm
. This tells the GCC to use plain old labels instead of those strange .cfi
directives, and it adds a special data structures at the end of your assembly, refering to those labels. This doesn't turn off the CFI, just changes the format to more "transparent" one: the CFI tables are then visible to the programmer.