Inline assembly - cdecl and preparing the stack

问题

I've recently been trying to implement dynamic functions in C++ by using a buffer and RAW hexadecimal equivalents of different assembly operators. To illustrate a simple jump:

byte * buffer = new buffer[5];
*buffer = '0xE9'; // Hex for jump
*(uint*)(buffer + 1) = 'address destination';

I am not experienced in assembly but I know enough to create very simple functions. Right now I'm creating cdecl functions in raw memory. The problem is, I do not know how much I want to push the stack (for memory) with sub. Let's take this function as an example:

int MyTest(int x, int y) { return x + y; }

long TheTest(int x, int y)
{
    return MyTest(x, 5);
}

08048a20 <_Z6TheTestii>:
_Z6TheTestii():
 8048a20:   55                      push   %ebp
 8048a21:   89 e5                   mov    %esp,%ebp
 8048a23:   83 ec 18                sub    $0x18,%esp
 8048a26:   c7 44 24 04 05 00 00    movl   $0x5,0x4(%esp)
 8048a2d:   00 
 8048a2e:   8b 45 08                mov    0x8(%ebp),%eax
 8048a31:   89 04 24                mov    %eax,(%esp)
 8048a34:   e8 c2 ff ff ff          call   80489fb <_Z6MyTestii>
 8048a39:   c9                      leave  
 8048a3a:   c3                      ret

As you can see, first is the C++ code and below is the ASM of the 'TheTest' function. One can instantly notice that the stack is pushed for 24 (0x18) bytes (as previously mentioned, I am not experienced using assembly so I might not use the correct terms and/or be completely right). This does not make any sense for me. How come 24 bytes is required when only 2 different integers are used? The variable 'x' is used, which is 4 bytes, and the value '5' which also uses 4 bytes (remember it's cdecl so the calling function takes care of memory regarding the function arguments) does not make up for 24....

Now here is an additional example which makes me really wonder about the assembly output:

int NewTest(int x, char val) { return x + val; }

long TheTest(int x, int y)
{
    return NewTest(x, (char)6);
}

08048a3d <_Z6TheTestiiii>:
_Z6TheTestiiii():
 8048a3d:   55                      push   %ebp
 8048a3e:   89 e5                   mov    %esp,%ebp
 8048a40:   83 ec 08                sub    $0x8,%esp
 8048a43:   c7 44 24 04 06 00 00    movl   $0x6,0x4(%esp)
 8048a4a:   00 
 8048a4b:   8b 45 08                mov    0x8(%ebp),%eax
 8048a4e:   89 04 24                mov    %eax,(%esp)
 8048a51:   e8 ca ff ff ff          call   8048a20 <_Z7NewTestic>
 8048a56:   c9                      leave  
 8048a57:   c3                      ret

The only difference here (except the values) is the fact that I use a 'char' (1 byte) instead of an integer. If we then look at the assembly code, this pushes the stack pointer for only 8 bytes. That's a difference of 16 bytes from the previous example. As an out-and-out C++ person, have I no clue what's going on. I would really appreciate if someone could enlighten me on the subject!

NOTE: The reason why I'm posting here instead of reading an ASM book, is because I need to use assembly for this one function. So I don't want to read a whole book for 40 lines of code...

EDIT: I also do not care for platform-dependency, I only care about Linux 32bit :)

回答1:

The stack frame created in TheTest holds both local (automatic) variables and arguments to functions, such as MyTest and NewTest, called by TheTest. The frame is pushed and popped by TheTest, so as long as it is big enough to hold the arguments to the functions it calls, the size doesn't matter much.

The compiler output you are seeing is the result of several passes of the compiler. Each pass may perform transformations and optimizations that reduce the frame size required; I suspect at some early state the compiler needed 24 bytes of frame, and never reduced it even though the code was optimized.

The ABI of the compiler on your platform will establish some rules about stack alignment that you must follow, so frame sizes are rounded up to meet these requirements.

These functions use the frame pointer %ebp% though this is not a win in code size or performance; this may aid debugging, though.

回答2:

It looks to me like your compiler is making a mistake for the first function (probably missing a stack usage optimization). It's also odd that your compiler is using two instructions (with move to a pre-allocated stack slot) rather than a single push instruction.

Are you compiling without optimization? Could you post your compiler command line?

回答3:

This is to keep the stack aligned to multiple of 32 bytes so that SIMD instructions can be used with variables on the stack.

回答4:

There is some prologue and epilogue code being inserted into these functions. Try writing your assembly in naked functions, i.e.

__declspec( naked ) void UsernameIdTramp() // 10 byter, 5 bytes saves + 5 bytes for tramp
{
    __asm 
    {  
        nop; nop; nop; nop; nop;   // 5 bytes copied from target - 
        nop; nop; nop; nop; nop;   // 5 bytes for the jump back.
    }
}

来源：https://stackoverflow.com/questions/10463960/inline-assembly-cdecl-and-preparing-the-stack

标签

c++

Linux

gcc

assembly

32-bit