GCC inline assembly with stack operation

前端 未结 3 1610
太阳男子
太阳男子 2020-12-21 16:08

I am in need of such a inline assembly code:

  • I have a pair(so, it is balanced) of push/pop operation inside the assembly
  • I also hav
相关标签:
3条回答
  • 2020-12-21 16:45

    Instead of putting the move into ecx within the assembly code, put the operand in ecx directly:

        : : "c"(foo)
    
    0 讨论(0)
  • 2020-12-21 16:51

    The direct use of the stack pointer to reference local variables is probably caused by the use of compiler optimizations. I think you could solve the issue in a couple of ways:

    • Disabling frame pointer optimizations (-fno-omit-frame-pointer in GCC);
    • Inserting esp in the Clobbers so the compiler will be aware that its value is being modified (check your compiler for compatibility).
    0 讨论(0)
  • 2020-12-21 16:56

    Modifying ESP inside inline-asm should generally be avoided when you have any memory inputs / outputs, so you don't have to disable optimizations or force the compiler to make a stack-frame with EBP some other way. One major advantage is that you (or the compiler) can then use EBP as an extra free register; potentially a significant speedup if you're already having to spill/reload stuff. If you're writing inline asm, presumably this is a hotspot so it's worth spending the extra code-size to use ESP-relative addressing modes.

    In x86-64 code, there's an added obstacle to using push/pop safely, because you can't tell the compiler you want to clobber the red-zone below RSP. (You can compile with -mno-red-zone, but there's no way to disable it from the C source.) You can get problems like this where you clobber the compiler's data on the stack. No 32-bit x86 ABI has a red-zone, though, so this only applies to x86-64 System V. (Or non-x86 ISAs with a red-zone.)

    You only need -fno-omit-frame-pointer for that function if you want to do asm-only stuff like push as a stack data structure, so there's a variable amount of push. Or maybe if optimizing for code-size.

    You can always write a whole non-inline function in asm and put it in a separate file, then you have full control. But only do that if your function is large enough to be worth the call/ret overhead, e.g. if it includes a whole loop; don't make the compiler call a short non-looping function inside a C inner loop, destroying all the call-clobbered registers and having to make sure globals are in sync.


    It seems you're using push / pop inside inline asm because you don't have enough registers, and need to save/reload something. You don't need to use push/pop for save/restore. Instead, use dummy output operands with "=m" constraints to get the compiler to allocate stack space for you, and use mov to/from those slots. (Of course you're not limited to mov; it can be a win to use a memory source operand for an ALU instruction if you only need the value once or twice.)

    This may be slightly worse for code-size, but is usually not worse for performance (and can be better). If that's not good enough, write the whole function (or the whole loop) in asm so you don't have to wrestle with the compiler.

    int foo(char *p, int a, int b) {
        int t1,t2;  // dummy output spill slots
        int r1,r2;  // dummy output tmp registers
        int res;
    
        asm ("# operands: %0  %1  %2  %3  %4  %5  %6  %7  %8\n\t"
             "imull  $123, %[b], %[res]\n\t"
             "mov   %[res], %[spill1]\n\t"
             "mov   %[a], %%ecx\n\t"
             "mov   %[b], %[tmp1]\n\t"  // let the compiler allocate tmp regs, unless you need specific regs e.g. for a shift count
             "mov   %[spill1], %[res]\n\t"
        : [res] "=&r" (res),
          [tmp1] "=&r" (r1), [tmp2] "=&r" (r2),  // early-clobber
          [spill1] "=m" (t1), [spill2] "=&rm" (t2)  // allow spilling to a register if there are spare regs
          , [p] "+&r" (p)
          , "+m" (*(char (*)[]) p) // dummy in/output instead of memory clobber
        : [a] "rmi" (a), [b] "rm" (b)  // a can be an immediate, but b can't
        : "ecx"
        );
    
        return res;
    
        // p unused in the rest of the function
        // so it's really just an input to the asm,
        // which the asm is allowed to destroy
    }
    

    This compiles to the following asm with gcc7.3 -O3 -m32 on the Godbolt compiler explorer. Note the asm-comment showing what the compiler picked for all the template operands: it picked 12(%esp) for %[spill1] and %edi for %[spill2] (because I used "=&rm" for that operand, so the compiler saved/restore %edi outside the asm, and gave it to us for that dummy operand).

    foo(char*, int, int):
        pushl   %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        subl    $16, %esp
        movl    36(%esp), %edx
        movl    %edx, %ebp
    #APP
    # 19 "/tmp/compiler-explorer-compiler118120-55-w92ge8.v797i/example.cpp" 1
            # operands: %eax  %ebx  %esi  12(%esp)  %edi  %ebp  (%edx)  40(%esp)  44(%esp)
        imull  $123, 44(%esp), %eax
        mov   %eax, 12(%esp)
        mov   40(%esp), %ecx
        mov   44(%esp), %ebx
        mov   12(%esp), %eax
    
    # 0 "" 2
    #NO_APP
        addl    $16, %esp
        popl    %ebx
        popl    %esi
        popl    %edi
        popl    %ebp
        ret
    

    Hmm, the dummy memory operand to tell the compiler which memory we modify seems to have resulted in dedicating a register to that, I guess because the p operand is early-clobber so it can't use the same register. I guess you could risk leaving off the early-clobber if you're confident none of the other inputs will use the same register as p. (i.e. that they don't have the same value).

    0 讨论(0)
提交回复
热议问题