How exactly does the callstack work?

后端 未结 7 1436
庸人自扰
庸人自扰 2020-11-30 16:44

I\'m trying to get a deeper understanding of how the low level operations of programming languages work and especially how they interact with the OS/CPU. I\'ve probably read

相关标签:
7条回答
  • 2020-11-30 16:55

    Because obviously, the next thing we need is to work with a and b but that would mean that the OS/CPU (?) has to pop out d and c first to get back to a and b. But then it would shoot itself in the foot because it needs c and d in the next line.

    In short:

    There is no need to pop the arguments. The arguments passed by caller foo to function doSomething and the local variables in doSomething can all be referenced as an offset from the base pointer.
    So,

    • When a function call is made, function's arguments are PUSHed on stack. These arguments are further referenced by base pointer.
    • When the function returns to its caller, the arguments of the returning function are POPed from the stack using LIFO method.

    In detail:

    The rule is that each function call results in a creation of a stack frame (with the minimum being the address to return to). So, if funcA calls funcB and funcB calls funcC, three stack frames are set up one on top of the another. When a function returns, its frame becomes invalid. A well-behaved function acts only on its own stack frame and does not trespass on another's. In another words the POPing is performed to the stack frame on the top (when returning from the function).

    enter image description here

    The stack in your question is setup by caller foo. When doSomething and doAnotherThing are called, then they setup their own stack. The figure may help you to understand this:

    enter image description here

    Note that, to access the arguments, the function body will have to traverse down (higher addresses) from the location where the return address is stored, and to access the local variables, the function body will have to traverse up the stack (lower addresses) relative to the location where the return address is stored. In fact, typical compiler generated code for the function will do exactly this. The compiler dedicates a register called EBP for this (Base Pointer). Another name for the same is frame pointer. The compiler typically, as the first thing for the function body, pushes the current EBP value on to the stack and sets the EBP to the current ESP. This means, once this is done, in any part of the function code, argument 1 is EBP+8 away (4 bytes for each of caller's EBP and the return address), argument 2 is EBP+12(decimal) away, local variables are EBP-4n away.

    .
    .
    .
    [ebp - 4]  (1st local variable)
    [ebp]      (old ebp value)
    [ebp + 4]  (return address)
    [ebp + 8]  (1st argument)
    [ebp + 12] (2nd argument)
    [ebp + 16] (3rd function argument) 
    

    Take a look at the following C code for the formation of stack frame of the function:

    void MyFunction(int x, int y, int z)
    {
         int a, int b, int c;
         ...
    }
    

    When caller call it

    MyFunction(10, 5, 2);  
    

    the following code will be generated

    ^
    | call _MyFunction  ; Equivalent to: 
    |                   ; push eip + 2
    |                   ; jmp _MyFunction
    | push 2            ; Push first argument  
    | push 5            ; Push second argument  
    | push 10           ; Push third argument  
    

    and the assembly code for the function will be (set-up by callee before returning)

    ^
    | _MyFunction:
    |  sub esp, 12 ; sizeof(a) + sizeof(b) + sizeof(c)
    |  ;x = [ebp + 8], y = [ebp + 12], z = [ebp + 16]
    |  ;a = [ebp - 4] = [esp + 8], b = [ebp - 8] = [esp + 4], c = [ebp - 12] =   [esp]
    |  mov ebp, esp
    |  push ebp
     
    

    References:

    • Function Call Conventions and the Stack.
    • Frame Pointer and Local Variables.
    • x86 Disassembly/Functions and Stack Frames.
    0 讨论(0)
  • 2020-11-30 17:02

    The call stack is not actually a stack data structure. Behind the scenes, the computers we use are implementations of the random access machine architecture. So, a and b can be directly accessed.

    Behind the scenes, the machine does:

    • get "a" equals reading the value of the fourth element below stack top.
    • get "b" equals reading the value of the third element below stack top.

    http://en.wikipedia.org/wiki/Random-access_machine

    0 讨论(0)
  • 2020-11-30 17:07

    There are already some really good answers here. However, if you are still concerned about the LIFO behavior of the stack, think of it as a stack of frames, rather than a stack of variables. What I mean to suggest is that, although a function may access variables that are not on the top of the stack, it is still only operating on the item at the top of the stack: a single stack frame.

    Of course, there are exceptions to this. The local variables of the entire call chain are still allocated and available. But they won't be accessed directly. Instead, they are passed by reference (or by pointer, which is really only different semantically). In this case a local variable of a stack frame much further down can be accessed. But even in this case, the currently executing function is still only operating on its own local data. It is accessing a reference stored in its own stack frame, which may be a reference to something on the heap, in static memory, or further down the stack.

    This is the part of the stack abstraction that makes functions callable in any order, and allows recursion. The top stack frame is the only object that is directly accessed by the code. Anything else is accessed indirectly (through a pointer that lives in the top stack frame).

    It might be instructive to look at the assembly of your little program, especially if you compile without optimization. I think you will see that all of the memory access in your function happens through an offset from the stack frame pointer, which is the how the code for the function will be written by the compiler. In the case of a pass by reference, you would see indirect memory access instructions through a pointer that is stored at some offset from the stack frame pointer.

    0 讨论(0)
  • 2020-11-30 17:07

    Here is a diagram I created for C's call stack. It's more accurate and contemporary than the google image versions

    And corresponding to the exact structure of the above diagram, here is a debug of notepad.exe x64 on windows 7.

    The low addresses and high addresses are swapped so the stack is climbing upwards in this diagram. Red indicates the frame exactly as in the first diagram (which used red and black, but black has now been repurposed); black is the home space; blue is the return address, which is an offset into the caller function to the instruction after the call; orange is the alignment and pink is where the instruction pointer is pointing right after the call and before the first instruction. The homespace+return value is the smallest allowed frame on windows and as the 16 byte rsp alignment right at the start of the called function must be maintained, this always includes an 8 byte alignment as well. Because these functions do not require any stack locals (because they can be optimised into registers) or stack parameters/return values (as they fit in registers) and do not use any of the other fields, the stack frames are all homespace+return_value+alignment in size.The first frame is of BaseThreadInitThunk and so on.

    The red function frames outline what the callee function logically 'owns' + reads / modifies (it can modify a parameter passed on the stack that was too big to pass in a register on -Ofast). The green lines demarcate the space the function allocates itself from the beginning to the end of the function.

    0 讨论(0)
  • 2020-11-30 17:08

    Like others noted, there is no need to pop parameters, until they go out of scope.

    I will paste some example from "Pointers and Memory" by Nick Parlante. I think the situation is a bit more simple than you envisioned.

    Here is code:

    void X() 
    {
      int a = 1;
      int b = 2;
    
      // T1
      Y(a);
    
      // T3
      Y(b);
    
      // T5
    }
    
    void Y(int p) 
    {
      int q;
      q = p + 2;
      // T2 (first time through), T4 (second time through)
    }
    

    The points in time T1, T2, etc. are marked in the code and the state of memory at that time is shown in the drawing:

    enter image description here

    0 讨论(0)
  • 2020-11-30 17:11

    The call stack could also be called a frame stack.
    The things that are stacked after the LIFO principle are not the local variables but the entire stack frames ("calls") of the functions being called. The local variables are pushed and popped together with those frames in the so-called function prologue and epilogue, respectively.

    Inside the frame the order of the variables is completely unspecified; Compilers "reorder" the positions of local variables inside a frame appropriately to optimize their alignment so the processor can fetch them as quickly as possible. The crucial fact is that the offset of the variables relative to some fixed address is constant throughout the lifetime of the frame - so it suffices to take an anchor address, say, the address of the frame itself, and work with offsets of that address to the variables. Such an anchor address is actually contained in the so-called base or frame pointer which is stored in the EBP register. The offsets, on the other hand, are clearly known at compile time and are therefore hardcoded into the machine code.

    This graphic from Wikipedia shows what the typical call stack is structured like1:

    Picture of a stack

    Add the offset of a variable we want to access to the address contained in the frame pointer and we get the address of our variable. So shortly said, the code just accesses them directly via constant compile-time offsets from the base pointer; It's simple pointer arithmetic.

    Example

    #include <iostream>
    
    int main()
    {
        char c = std::cin.get();
        std::cout << c;
    }
    

    gcc.godbolt.org gives us

    main:
        pushq   %rbp
        movq    %rsp, %rbp
        subq    $16, %rsp
    
        movl    std::cin, %edi
        call    std::basic_istream<char, std::char_traits<char> >::get()
        movb    %al, -1(%rbp)
        movsbl  -1(%rbp), %eax
        movl    %eax, %esi
        movl    std::cout, %edi
        call    [... the insertion operator for char, long thing... ]
    
        movl    $0, %eax
        leave
        ret
    

    .. for main. I divided the code into three subsections. The function prologue consists of the first three operations:

    • Base pointer is pushed onto the stack.
    • The stack pointer is saved in the base pointer
    • The stack pointer is subtracted to make room for local variables.

    Then cin is moved into the EDI register2 and get is called; The return value is in EAX.

    So far so good. Now the interesting thing happens:

    The low-order byte of EAX, designated by the 8-bit register AL, is taken and stored in the byte right after the base pointer: That is -1(%rbp), the offset of the base pointer is -1. This byte is our variable c. The offset is negative because the stack grows downwards on x86. The next operation stores c in EAX: EAX is moved to ESI, cout is moved to EDI and then the insertion operator is called with cout and c being the arguments.

    Finally,

    • The return value of main is stored in EAX: 0. That is because of the implicit return statement. You might also see xorl rax rax instead of movl.
    • leave and return to the call site. leave is abbreviating this epilogue and implicitly
      • Replaces the stack pointer with the base pointer and
      • Pops the base pointer.

    After this operation and ret have been performed, the frame has effectively been popped, although the caller still has to clean up the arguments as we're using the cdecl calling convention. Other conventions, e.g. stdcall, require the callee to tidy up, e.g. by passing the amount of bytes to ret.

    Frame Pointer Omission

    It is also possible not to use offsets from the base/frame pointer but from the stack pointer (ESB) instead. This makes the EBP-register that would otherwise contain the frame pointer value available for arbitrary use - but it can make debugging impossible on some machines, and will be implicitly turned off for some functions. It is particularly useful when compiling for processors with only few registers, including x86.

    This optimization is known as FPO (frame pointer omission) and set by -fomit-frame-pointer in GCC and -Oy in Clang; note that it is implicitly triggered by every optimization level > 0 if and only if debugging is still possible, since it doesn't have any costs apart from that. For further information see here and here.


    1 As pointed out in the comments, the frame pointer is presumably meant to point to the address after the return address.

    2 Note that the registers that start with R are the 64-bit counterparts of the ones that start with E. EAX designates the four low-order bytes of RAX. I used the names of the 32-bit registers for clarity.

    0 讨论(0)
提交回复
热议问题