Assembly - Swap function - Why it will not work?

后端 未结 2 837
我在风中等你
我在风中等你 2021-01-27 21:50

I need to create a function that swaps the value of &x with the value of &y (meaning swap *(&y) and *(&x).

Swap:

    push EBP
    mov EBP,ESP
           


        
2条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-27 22:30

    Aside from Alexey's bugfix, you could make this significantly more efficient. (Of course inlining the swap and optimizing at the call site is even better.)

    There's no need for a local temporary on the stack: you could instead reload one of the addresses twice, or save/restore ESI and use it as a temporary.

    You're actually destroying EBX, which is call-preserved in all the normal C calling conventions. In most 32-bit x86 calling conventions, EAX, ECX, and EDX are the three call-clobbered registers you can use without saving/restoring, while the others are call-preserved. (So i.e. your caller expects you not to destroy their values, so you can only use them if you put back the original value. This is why EBP has to be restored after you use it for a frame pointer.)


    What gcc -O3 -m32 does when compiling a stand-alone (not inlined) definition for a swap function is save/restore EBX so it has 4 registers to play with. clang chooses ESI.

    void swap(int *px, int *py) {
        int tmp = *px;
        *px = *py;
        *py = tmp;
    }
    

    On the Godbolt compiler explorer:

    # gcc8.2 -O3 -m32 -fverbose-asm
    # gcc itself emitted the comments on the following instructions
    swap:
            push    ebx     #
            mov     edx, DWORD PTR [esp+8]    # px, px
            mov     eax, DWORD PTR [esp+12]   # py, py
            mov     ecx, DWORD PTR [edx]      # tmp, *px_3(D)
            mov     ebx, DWORD PTR [eax]      # tmp91, *py_5(D)
            mov     DWORD PTR [edx], ebx      # *px_3(D), tmp91
            mov     DWORD PTR [eax], ecx      # *py_5(D), tmp
            pop     ebx       #
            ret  
    
    # DWORD PTR is the gas .intel_syntax equivalent of NASM's DWORD
    # you can just remove them all because the register implies an operand size
    

    It also avoids making a legacy stack-frame. You can add -fno-omit-frame-pointer to the compiler options to see code-gen with a frame pointer, if you want. (Godbolt will recompile and show you the asm. Very handy site for exploring compiler options and code changes.)

    64-bit calling conventions already have args in registers, and have enough scratch regs so we just get 4 instructions, much more efficient.


    As I mentioned, another option is to reload one of the pointer args twice:

    swap:
           # without a push, offsets relative to ESP are smaller by 4
            mov     edx, [esp+4]    # edx = px   reused later
            mov     eax, [esp+8]    # eax = py   also reused later
            mov     ecx, [edx]      # ecx = tmp = *px   lives for the whole function
    
            mov     eax, [eax]      # eax = *py   destroying our register copy of py
            mov    [edx], eax       # *px = *py;  done with px, can now destroy it
    
            mov     edx, [esp+8]   # edx = py
            mov    [edx], ecx       # *py = tmp;
            ret  
    

    Only 7 instructions instead of 8. Loading the same value twice is very cheap, and out-of-order execution means it's not a problem to have the store address ready quickly even though in program order it's only the instruction right before the store that loads the address.

自定义标题
段落格式
字体
字号
代码语言
提交回复
热议问题