OSX 64 bit C++ DIsassembly line by line

强颜欢笑 提交于 2019-12-25 07:18:33

问题


I have been reading through the following series of articles: http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c

The disassembled code shown and the disassembled code I am managing to produce whilst running the same code vary quite significantly and I lack the understanding to explain the differences.

Is there anyone that can step through it line by line and perhaps explain what it's doing at each step ? I get the feeling from the searching around I have done that the first few lines have something to do with frame pointers, there also seems to be a few extra lines in my disassembled code that ensures registers are empty before placing new values into them (absent from the code in the article)

I am running this on OSX (original author is using Windows) using the g++ compiler from within XCode 4. I am really clueless as to weather or not these variances are due to the OS, the architecture (32 bit vs 64 bit maybe?) or the compiler itself. It could even be the code I guess - mine is wrapped inside the main function declaration whereas the original code makes no mention of this.

My code:

int main(int argc, const char * argv[])
{

    int x = 1;
    int y = 2;
    int z = 0;

    z = x + y;

}

My disassembled code:

0x100000f40:  pushq  %rbp
0x100000f41:  movq   %rsp, %rbp
0x100000f44:  movl   $0, %eax
0x100000f49:  movl   %edi, -4(%rbp)
0x100000f4c:  movq   %rsi, -16(%rbp)
0x100000f50:  movl   $1, -20(%rbp)
0x100000f57:  movl   $2, -24(%rbp)
0x100000f5e:  movl   $0, -28(%rbp)
0x100000f65:  movl   -20(%rbp), %edi
0x100000f68:  addl   -24(%rbp), %edi
0x100000f6b:  movl   %edi, -28(%rbp)
0x100000f6e:  popq   %rbp
0x100000f6f:  ret    

The disassembled code from the original article:

mov    dword ptr [ebp-8],1
mov    dword ptr [ebp-14h],2
mov    dword ptr [ebp-20h],0
mov    eax, dword ptr [ebp-8]
add    eax, dword ptr [ebp-14h]
mov    dword ptr [ebp-20h],eax

A full line by line breakdown would be extremely enlightening but any help in understanding this would be appreciated.


回答1:


There are two major differences between your disassembled code and the article's code.

One is that the article is using the Intel assembler syntax, while your disassembled code is using the traditional Unix/AT&T assembler syntax. Some differences between the two are documented on Wikipedia.

The other difference is that the article omits the function prologue, which sets up the stack frame, and the function epilogue, which destroys the stack frame and returns to the caller. The program he's disassembling has to contain instructions to do those things, but his disassembler isn't showing them. (Actually the stack frame could and probably would be omitted if the optimizer were enabled, but it's clearly not enabled.)

There are also some minor differences: your code is using a slightly different layout for local variables, and your code is computing the sum in a different register.

On the Mac, g++ doesn't support emitting Intel mnemonics, but clang does:

:; clang -S -mllvm --x86-asm-syntax=intel t.c
:; cat t.s
    .section    __TEXT,__text,regular,pure_instructions
    .globl  _main
    .align  4, 0x90
_main:                                  ## @main
    .cfi_startproc
## BB#0:
    push    RBP
Ltmp2:
    .cfi_def_cfa_offset 16
Ltmp3:
    .cfi_offset rbp, -16
    mov RBP, RSP
Ltmp4:
    .cfi_def_cfa_register rbp
    mov EAX, 0
    mov DWORD PTR [RBP - 4], EDI
    mov QWORD PTR [RBP - 16], RSI
    mov DWORD PTR [RBP - 20], 1
    mov DWORD PTR [RBP - 24], 2
    mov DWORD PTR [RBP - 28], 0
    mov EDI, DWORD PTR [RBP - 20]
    add EDI, DWORD PTR [RBP - 24]
    mov DWORD PTR [RBP - 28], EDI
    pop RBP
    ret
    .cfi_endproc


.subsections_via_symbols

If you add the -g flag, the compiler will add debug information including source filenames and line numbers. It's too big to put here in its entirety, but this is the relevant part:

    .loc    1 4 14 prologue_end     ## t.c:4:14
Ltmp5:
    mov DWORD PTR [RBP - 20], 1
    .loc    1 5 14                  ## t.c:5:14
    mov DWORD PTR [RBP - 24], 2
    .loc    1 6 14                  ## t.c:6:14
    mov DWORD PTR [RBP - 28], 0
    .loc    1 8 5                   ## t.c:8:5
    mov EDI, DWORD PTR [RBP - 20]
    add EDI, DWORD PTR [RBP - 24]
    mov DWORD PTR [RBP - 28], EDI



回答2:


All of the code from the original article is in your code, there's just some extra stuff around it. This:

0x100000f50:  movl   $1, -20(%rbp)
0x100000f57:  movl   $2, -24(%rbp)
0x100000f5e:  movl   $0, -28(%rbp)
0x100000f65:  movl   -20(%rbp), %edi
0x100000f68:  addl   -24(%rbp), %edi
0x100000f6b:  movl   %edi, -28(%rbp)

Corresponds directly to the 6 instructions talked about in the article.




回答3:


First of all, the assembler listed as "from original article" is using "Intel" syntax, where the "disassembled output" in your post is "AT&T syntax". This explains the order of arguments to instructions being "back to front" [let's not argue about which is right or wrong, ok?], and register names are prefixed by a %, constants prefixed by $. There is also a difference in how memory locations/offsets to registers are referenced - dword ptr [reg+offs] in Intel assembler translates to l as a suffix on the instruction, and offs(%reg).

The 32-bit vs. 64-bit renames some of the registers - %rbp is the same as ebp in the article code.

The actual offsets (e.g -20) are different partly because the registers are bigger in 64-bit, but also because you have argc and argv as part of your function arguments, which is stored as part of the start of the function - I have a feeling the original article is actually disassembling a different function than main.



来源:https://stackoverflow.com/questions/15034247/osx-64-bit-c-disassembly-line-by-line

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!