How do I pass inputs into extended asm?

问题

Consider this code, from my earlier question.

int main(){
    asm("movq $100000000, %rcx;"
            "startofloop: ; "
            "sub $0x1, %rcx; "
            "jne startofloop; ");
}

I would like to make number of iterations of the loop a C variable, so I tried the following after reading this document.

int main(){                                      
    int count = 100000000;                       
    asm("movq %0, %rcx;"                         
            "startofloop: ; "                    
            "sub $0x1, %rcx; "                   
            "jne startofloop; ":: "r"(count));   
}

Unfortunately, this fails to compile, and breaks with the following error.

asm_fail.c: In function ‘main’:
asm_fail.c:3:5: error: invalid 'asm': operand number missing after %-letter
     asm("movq %0, %rcx;"
     ^
asm_fail.c:3:5: error: invalid 'asm': operand number missing after %-letter

What is the correct way to pass the value of the C variable into the assembly?

回答1:

If using extended assembler templates (ones with input, output, clobbers etc) then you need to prepend an extra % on the register names inside the template. %%rcx in this case. This will solve the issue related to this error:

error: invalid 'asm': operand number missing after %-letter

This will present a new problem. You'll receive an error similar to:

operand type mismatch for 'movq'

The issue is that "r"(count) input constraint tells the compiler that it should pick a register that will contain the value in count. Since count is defined as an int type, it will choose a 32-bit register. For sake of argument assume it chooses EAX. After substitution it would have tried to generate this instruction:

movq %eax, %rcx

You can't use movq to move the contents of a 32-bit register to a 64-bit register and thus the error. The better choice is to use ECX as the target so that both will be of the same type. Revised code would look like:

asm("mov %0, %%ecx;"                         
    "startofloop: ; "                    
    "sub $0x1, %%ecx; "                   
    "jne startofloop; ":: "r"(count));

Alternatively you could have chosen to use an input operand of "ri"(count). This would allow the compiler to choose either a register or an immediate value. On a higher optimization level (-O1, -O2) it will likely determine in this case that count remains constant (100000000) and generate code like:

mov $100000000, %ecx                         
startofloop:
sub $0x1, %ecx
jne startofloop

Rather than being forced to place 100000000 into a register and copy it to ECX it can use an immediate value instead.

A serious problem in your template is that you destroy the contents of ECX but GCC has no knowledge of this. GCC doesn't actually parse the instructions inside the template to determine what the code does. It has no idea you have clobbered ECX. The compiler may rely on ECX having the same value before and after the template. If you destroy a register not referenced in the output operands, you must explicitly list it in the clobber list. Something like this would work:

asm("mov %0, %%ecx;"                         
    "startofloop: ; "                    
    "sub $0x1, %%ecx; "                   
    "jne startofloop; ":: "ri"(count) : "rcx");

Now GCC knows it can't rely on the value in RCX being the same value before and after the template is executed.

Rather than using a fixed register as your internal counter, you can get GCC to pick something that is available. Doing this will mean we don't need the clobber anymore. You can create a dummy variable (a temporary) that can be used to count with. To avoid this code being optimized out altogether we can use the volatile attribute on the assembler template. This isn't required when the assembler template has no output operands. Code like this would work:

int count=100000000
int dummy;
asm volatile("mov %1, %0;"                         
    "startofloop: ; "                    
    "sub $0x1, %0; "                   
    "jne startofloop; ":"=rm"(dummy): "ri"(count));

The =rm output constraint says that either a memory location or a register can be used for this operand. Giving the choice to the compiler allows the opportunity to generate better code. At an optimization level of -O1 you would likely find the code generated would look like:

mov    $0x5f5e100,%ebx
startofloop:
sub    $0x1,%ebx
jne    startofloop

In this case the compiler chose to use an immediate operand for count ($0x5f5e100 = $100000000). The dummy variable was optimized down to the register EBX.

There are other tricks you can do to improve the template. One can read more about extended assembler templates in the GNU documentation

Your code appeared to preserve the value in variable count. If it wasn't a requirement for count to have the same value before the template is executed you could use count for both input and output. That code could look like:

asm volatile("startofloop: ; "
    "sub $0x1, %0; "
    "jne startofloop; ":"+rm"(count): );

+rm means that the output operand is also being used as an input operand. In this case count should always be zero when finished.

If you use the GCC -S option to output the generated assembly code then you may wish to alter your template so the output looks cleaner. Rather than using a ; (semicolon) use \n\t instead. This will break up the assembler template into multiple lines and add indentation. An example:

asm volatile("mov %1, %0\n\t"                         
    "startofloop:\n\t"                    
    "sub $0x1, %0\n\t"                   
    "jne startofloop\n\t":"=rm"(dummy): "ri"(count));

Generally speaking, you shouldn't use inline assembler templates unless you have no alternative. Code it in C and guide the compiler to output the assembler you want, or use compiler intrinsics if need be. Inline assembler should be used as a last resort, or if your homework demands it. David Wohlferd wrote a Wiki article on the subject.

来源：https://stackoverflow.com/questions/37956211/how-do-i-pass-inputs-into-extended-asm

标签

gcc

x86-64

inline-assembly

att