问题
Consider the following small function:
void foo(int* iptr) {
iptr[10] = 1;
__asm__ volatile ("nop"::"r"(iptr):);
iptr[10] = 2;
}
Using gcc, this compiles to:
foo:
nop
mov DWORD PTR [rdi+40], 2
ret
Note in particular, that the first write to iptr
, iptr[10] = 1
doesn't occur at all: the inline asm nop
is the first thing in the function, and only the final write of 2
appears (after the ASM call). Apparently the compiler decides that it only needs to provide an up-to-date version of the value of iptr
itself, but not the memory it points to.
I can tell the compiler that memory must be up to date with a memory
clobber, like so:
void foo(int* iptr) {
iptr[10] = 1;
__asm__ volatile ("nop"::"r"(iptr):"memory");
iptr[10] = 2;
}
which results in the expected code:
foo:
mov DWORD PTR [rdi+40], 1
nop
mov DWORD PTR [rdi+40], 2
ret
However, this is too strong of a condition, since it tells the compiler all memory has to be written. For example, in the following function:
void foo2(int* iptr, long* lptr) {
iptr[10] = 1;
lptr[20] = 100;
__asm__ volatile ("nop"::"r"(iptr):);
iptr[10] = 2;
lptr[20] = 200;
}
The desired behavior is to let the compiler optimize away the first write to lptr[20]
, but not the first write to iptr[10]
. The "memory"
clobber cannot achieve this because it means both writes have to occur:
foo2:
mov DWORD PTR [rdi+40], 1
mov QWORD PTR [rsi+160], 100 ; lptr[10] written unecessarily
nop
mov DWORD PTR [rdi+40], 2
mov QWORD PTR [rsi+160], 200
ret
Is there some way to tell compilers accepting gcc extended asm syntax that the input to the asm includes the pointer and anything it can point to?
回答1:
That's correct; asking for a pointer as input to inline asm does not imply that the pointed-to memory is also an input or output or both. With a register input and register output, for all gcc knows your asm just aligns a pointer by masking off the low bits, or adds a constant to it. (In which case you would want it to optimize away a dead store.)
The simple option is asm volatile
and a "memory"
clobber1.
The narrower more specific way you're asking for is to use a "dummy" memory operand as well as the pointer in a register. Your asm template doesn't reference this operand (except maybe inside an asm comment to see what the compiler picked). It tells the compiler which memory you actually read, write, or read+write.
Dummy memory input: "m" (*(const int (*)[]) iptr)
or output: "=m" (*(int (*)[]) iptr)
. Or of course "+m"
with the same syntax.
That syntax is casting to a pointer-to-array and dereferencing, so the actual input is a C array. (If you actually have an array, not pointer, you don't need any casting and can just ask for it as a memory operand.)
If you leave the size unspecified with []
, that tells GCC that any memory accessed relative to that pointer is an input, output, or in/out operand. If you use [10]
or [some_variable]
, that tells the compiler the specific size. With runtime-variable sizes, gcc in practice misses the optimization that iptr[size+1]
is not part of the input.
GCC documents this and therefore supports it. I think it's not a strict-aliasing violation if the array element type is the same as the pointer, or maybe if it's char
.
(from the GCC manual)
An x86 example where the string memory argument is of unknown length.asm("repne scasb" : "=c" (count), "+D" (p) : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));
If you can avoid using an early-clobber on the pointer input operand, the dummy memory input operand will typically pick a simple addressing mode using that same register.
But if you do use an early-clobber for strict correctness of an asm loop, sometimes a dummy operand will make gcc waste instructions (and an extra register) on a base address for the memory operand. Check the asm output of the compiler.
Background:
This is a widespread bug in inline-asm examples which often goes undetected because the asm is wrapped in a function that doesn't inline into any callers that tempt the compiler into reordering stores for merging doing dead-store elimination.
GNU C inline asm syntax is designed around describing a single instruction to the compiler. The intent is that you tell the compiler about a memory input or memory output with a "m"
or "=m"
operand constraint, and it picks the addressing mode.
Writing whole loops in inline asm requires care to make sure the compiler really knows what's going on (or asm volatile
plus a "memory"
clobber), otherwise you risk breakage when changing the surrounding code, or enabling link-time optimization that allows for cross-file inlining.
See also Looping over arrays with inline assembly for using an asm
statement as the loop body, still doing the loop logic in C. With actual (non-dummy) "m"
and "=m"
operands, the compiler can unroll the loop by using displacements in the addressing modes it chooses.
Footnote 1: A "memory"
clobber gets the compiler to treat the asm like a non-inline function call (that could read or write any memory except for locals that escape analysis has proved have not escaped). The escape analysis includes input operands to the asm statement itself, but also any global or static variables that any earlier call could have stored pointers into. So usually local loop counters don't have to be spilled/reloaded around an asm
statement with a "memory"
clobber.
asm volatile
is necessary to make sure the asm isn't optimized away even if its output operands are unused (because you require the un-declared the side-effect of writing memory to happen).
Or for memory that is only read by asm, you you need the asm to run again if the same input buffer contains different input data. Without volatile
, the asm statement could be CSEd out of a loop. (A "memory"
clobber does not make the optimizer treat all memory as an input when considering whether the asm
statement even needs to run.)
asm
with no output operands is implicitly volatile
, but it's a good idea to make it explicit. (The GCC manual has a section on asm volatile).
e.g. asm("... sum an array ..." : "=r"(sum) : "r"(pointer), "r"(end_pointer) : "memory")
has an output operand so is not implicitly volatile. If you used it like
arr[5] = 1;
total += asm_sum(arr, len);
memcpy(arr, foo, len);
total += asm_sum(arr, len);
Without volatile
the 2nd asm_sum
could optimize away, assuming that the same asm with the same input operands (pointer and length) will produce the same output. You need volatile
for any asm that's not a pure function of its explicit input operands. If it doesn't optimize away, then the "memory"
clobber will have the desired effect of requiring memory to be in sync.
来源:https://stackoverflow.com/questions/61119506/risc-v-inline-assembly-struct-optimized-away