How Do I Use Labels In GCC Inline Assembly?

后端 未结 3 1227
攒了一身酷
攒了一身酷 2020-12-10 09:29

I\'m trying to learn x86-64 inline assembly and decided to implement this very simple swap method that simply orders a and b in ascending order:

相关标签:
3条回答
  • 2020-12-10 10:12

    You cannot just put a bunch of asm statements inline like that. The optimizer is free to re-order, duplicate, and drop them based on what constraints it knows. (In your case, it knows none.)

    So firstly, you should consolidate the asm together, with proper read/write/clobber constraints. Secondly, there is a special asm goto form that gives assembly to C-level labels.

    void swap(int *a, int *b) {
        int tmp1, tmp2;
        asm(
            "mov (%2), %0\n"
            "mov (%3), %1\n"
            : "=r" (tmp1), "=r" (tmp2)
            : "r" (a), "r" (b)
            : "memory"   // pointer in register doesn't imply that the pointed-to memory has to be "in sync"
            // or use "m" memory source operands to let the compiler pick the addressing mode
        );
        asm goto(
            "cmp %1, %0\n"
            "jle %l4\n"
            "mov %1, (%2)\n"
            "mov %0, (%3)\n"
            :
            : "r" (tmp1), "r" (tmp2), "r" (a), "r" (b)
            : "cc", "memory"
            : L1
        );
    L1:
        return;
    }
    
    0 讨论(0)
  • 2020-12-10 10:20

    There are plenty of tutorials - including this one (probably the best I know of), and some info on operand size modifiers.

    Here's the first implementation - swap_2 :

    void swap_2 (int *a, int *b)
    {
        int tmp0, tmp1;
    
        __asm__ volatile (
            "movl (%0), %k2\n\t" /* %2 (tmp0) = (*a) */
            "movl (%1), %k3\n\t" /* %3 (tmp1) = (*b) */
            "cmpl %k3, %k2\n\t"
            "jle  %=f\n\t"       /* if (%2 <= %3) (at&t!) */
            "movl %k3, (%0)\n\t"
            "movl %k2, (%1)\n\t"
            "%=:\n\t"
    
            : "+r" (a), "+r" (b), "=r" (tmp0), "=r" (tmp1) :
            : "memory" /* "cc" */ );
    }
    

    A few notes:

    • volatile (or __volatile__) is required, as the compiler only 'sees' (a) and (b) (and doesn't 'know' you're potentially exchanging their contents), and would otherwise be free to optimize the whole asm statement away - tmp0 and tmp1 would otherwise be considered unused variables too.

    • "+r" means that this is both an input and output that may be modified; only it isn't in this case, and they could strictly be input only - more on that in a bit...

    • The 'l' suffix on 'movl' isn't really necessary; neither is the 'k' (32-bit) length modifier for the registers. Since you're using the Linux (ELF) ABI, an int is 32 bits for both IA32 and x86-64 ABIs.

    • The %= token generates a unique label for us. BTW, the jump syntax <label>f means a forward jump, and <label>b means back.

    • For correctness, we need "memory" as the compiler has no way of knowing if values from dereferenced pointers have been changed. This may be an issue in more complex inline asm surrounded by C code, as it invalidates all currently held values in memory - and is often a sledgehammer approach. Appearing at the end of a function in this fashion, it's not going to be an issue - but you can read more on it here (see: Clobbers)

    • The "cc" flags register clobber is detailed in the same section. on x86, it does nothing. Some writers include it for clarity, but since practically all non-trivial asm statements affect the flags register, it's just assumed to be clobbered by default.

    Here's the C implementation - swap_1 :

    void swap_1 (int *a, int *b)
    {
        if (*a > *b)
        {
            int t = *a; *a = *b; *b = t;
        }
    }
    

    Compiling with gcc -O2 for x86-64 ELF, I get identical code. Just a bit of luck that the compiler chose tmp0 and tmp1 to use the same free registers for temps... cutting out the noise, like the .cfi directives, etc., gives:

    swap_2:
            movl (%rdi), %eax
            movl (%rsi), %edx
            cmpl %edx, %eax
            jle  21f
            movl %edx, (%rdi)
            movl %eax, (%rsi)
            21:
            ret
    

    As stated, the swap_1 code was identical, except that the compiler chose .L1 for its jump label. Compiling the code with -m32 generated the same code (apart from using the tmp registers in a different order). There's more overhead, as the IA32 ELF ABI passes parameters on the stack, whereas the x86-64 ABI passes the first two parameters in %rdi and %rsi respectively.


    Treating (a) and (b) as input only - swap_3 :

    void swap_3 (int *a, int *b)
    {
        int tmp0, tmp1;
    
        __asm__ volatile (
            "mov (%[a]), %[x]\n\t" /* x = (*a) */
            "mov (%[b]), %[y]\n\t" /* y = (*b) */
            "cmp %[y], %[x]\n\t"
            "jle  %=f\n\t"         /* if (x <= y) (at&t!) */
            "mov %[y], (%[a])\n\t"
            "mov %[x], (%[b])\n\t"
            "%=:\n\t"
    
            : [x] "=&r" (tmp0), [y] "=&r" (tmp1)
            : [a] "r" (a), [b] "r" (b) : "memory" /* "cc" */ );
    }
    

    I've done away with the 'l' suffix and 'k' modifiers here, because they're not needed. I've also used the 'symbolic name' syntax for operands, as it often helps to make the code more readable.

    (a) and (b) are now indeed input-only registers. So what's the "=&r" syntax mean? The & denotes an early clobber operand. In this case, the value may be written to before we finish using the input operands, and therefore the compiler must choose registers different from those selected for the input operands.

    Once again, the compiler generates identical code as it did for swap_1 and swap_2.


    I wrote way more than I planned on this answer, but as you can see, it's very difficult to maintain awareness of all the information the compiler must be made aware of, as well as the idiosyncrasies of each instruction set (ISA) and ABI.

    0 讨论(0)
  • 2020-12-10 10:26

    You cannot assume values are in any particular register in your asm code -- you need to use constraints to tell gcc what values you want to read and write and get it to tell you which register they are in. The gcc docs tell you most of what you need to know, but are pretty dense. There are also tutorials out there that you can easily find with a web search (here or here)

    0 讨论(0)
提交回复
热议问题