Assembly - Swap function - Why it will not work?

后端 未结 2 834
我在风中等你
我在风中等你 2021-01-27 21:50

I need to create a function that swaps the value of &x with the value of &y (meaning swap *(&y) and *(&x).

Swap:

    push EBP
    mov EBP,ESP
           


        
相关标签:
2条回答
  • 2021-01-27 22:19

    You don't actually reserve memory on the stack that you use when you access a dword at [EBP-4]. It can get overwritten by things like interrupt routines, signal handlers, asynchronously called procedures, whatever applies in your OS.

    The code should be like this instead:

    swap:
        push  EBP
        mov   EBP,ESP           ; make a traditional stack frame
    
        sub   ESP, 4         ; reserve memory for a local variable at [EBP-4]
    
        mov   EBX, [EBP+12]        ; ebx = &x
        mov   EAX, DWORD [EBX]     ; eax = x
        mov   DWORD [EBP-4], EAX   ; [ebp-4] = eax = x
        mov   EDX, [EBP+8]         ; edx = &y
        mov   EAX, DWORD [EDX]     ; eax = y
        mov   DWORD [EBX], EAX     ; *&x = y
        mov   EAX, DWORD [EBP-4]   ; eax = x reloaded from the local
        mov   DWORD [EDX], EAX     ; *&y = x
    
        leave          ; remove locals (by restoring ESP), restore EBP
    
        ret
    

    Also, make sure that you're passing as parameters the addresses of the variables x and y, not the values of the variables. push x+push y will pass the addresses of x and y in NASM but they will pass values of x and y in TASM and MASM.

    0 讨论(0)
  • 2021-01-27 22:30

    Aside from Alexey's bugfix, you could make this significantly more efficient. (Of course inlining the swap and optimizing at the call site is even better.)

    There's no need for a local temporary on the stack: you could instead reload one of the addresses twice, or save/restore ESI and use it as a temporary.

    You're actually destroying EBX, which is call-preserved in all the normal C calling conventions. In most 32-bit x86 calling conventions, EAX, ECX, and EDX are the three call-clobbered registers you can use without saving/restoring, while the others are call-preserved. (So i.e. your caller expects you not to destroy their values, so you can only use them if you put back the original value. This is why EBP has to be restored after you use it for a frame pointer.)


    What gcc -O3 -m32 does when compiling a stand-alone (not inlined) definition for a swap function is save/restore EBX so it has 4 registers to play with. clang chooses ESI.

    void swap(int *px, int *py) {
        int tmp = *px;
        *px = *py;
        *py = tmp;
    }
    

    On the Godbolt compiler explorer:

    # gcc8.2 -O3 -m32 -fverbose-asm
    # gcc itself emitted the comments on the following instructions
    swap:
            push    ebx     #
            mov     edx, DWORD PTR [esp+8]    # px, px
            mov     eax, DWORD PTR [esp+12]   # py, py
            mov     ecx, DWORD PTR [edx]      # tmp, *px_3(D)
            mov     ebx, DWORD PTR [eax]      # tmp91, *py_5(D)
            mov     DWORD PTR [edx], ebx      # *px_3(D), tmp91
            mov     DWORD PTR [eax], ecx      # *py_5(D), tmp
            pop     ebx       #
            ret  
    
    # DWORD PTR is the gas .intel_syntax equivalent of NASM's DWORD
    # you can just remove them all because the register implies an operand size
    

    It also avoids making a legacy stack-frame. You can add -fno-omit-frame-pointer to the compiler options to see code-gen with a frame pointer, if you want. (Godbolt will recompile and show you the asm. Very handy site for exploring compiler options and code changes.)

    64-bit calling conventions already have args in registers, and have enough scratch regs so we just get 4 instructions, much more efficient.


    As I mentioned, another option is to reload one of the pointer args twice:

    swap:
           # without a push, offsets relative to ESP are smaller by 4
            mov     edx, [esp+4]    # edx = px   reused later
            mov     eax, [esp+8]    # eax = py   also reused later
            mov     ecx, [edx]      # ecx = tmp = *px   lives for the whole function
    
            mov     eax, [eax]      # eax = *py   destroying our register copy of py
            mov    [edx], eax       # *px = *py;  done with px, can now destroy it
    
            mov     edx, [esp+8]   # edx = py
            mov    [edx], ecx       # *py = tmp;
            ret  
    

    Only 7 instructions instead of 8. Loading the same value twice is very cheap, and out-of-order execution means it's not a problem to have the store address ready quickly even though in program order it's only the instruction right before the store that loads the address.

    0 讨论(0)
提交回复
热议问题