问题
This is a simple question but I just came across it. In the code snippet below I create three pointers. I know the three will exhibit equivalent behavior (all point to the same thing), but I honestly thought the third action in the code was the most "efficient", meaning that it would generate less assembly instructions to accomplish the same thing as the other two.
I assumed that the first two have to first deference a pointer, and then take the memory address of the thing that was dereferenced, and then set some pointer equal to that memory address. The third I thought, just needed to increment a memory address by 1.
To my surprise, all three generate the same assembly instructions even with optimizations turned off: https://godbolt.org/z/Weefn4
Am I missing something obvious? Is there some compiler magic that simply recognizes these three as equivalent?
#include "stdio.h"
#include "stdint.h"
int main()
{
unsigned int x[10];
unsigned int* a = &x[1]; // Get address of dereferenced x[1]
unsigned int* b = &(*(x+1)); // Get address of dereferenced *(x+1)
unsigned int* c = x+1; // Get address x+1
printf("%x\n", a);
printf("%x\n", b);
printf("%x\n", c);
}
回答1:
Note that gcc -O0
really only disables optimization across statements, and disables only some within statements. See Disable all optimization options in GCC.
Within a single statement, it still does some of its usual optimizations within statements, including multiplicative inverses for division by non-power-of-2 constants.
Some other compilers do more braindead transliteration of C into asm with optimization disabled, e.g. MSVC will sometimes put a constant into a register and compare it against another constant, with two immediates. GCC never does anything that dumb; it evaluates constant expressions as far as possible and removes always-false branches.
If you want a very literal-minded compiler, a look at TinyCC, a one-pass compiler.
In this case: The ISO C standard defines all of those in terms of x+1
x[y]
is syntactical sugar for *(x+y)
, so ISO C only has to define the rules for pointer math; the +
operator between pointer and integral types. +
is commutative (x+y
and y+x
are exactly equivalent), so it's not surprising that variations on that boil down to the same thing. In your case, T x[10]
decays to a T*
for the pointer math.
&*x
"cancels out": the ISO C abstract machine never truly references the *x
object, so this is safe even if x
is a NULL pointer or pointing past the end of an array or whatever. That's why this takes the address of the array element, not of some temporary *x
object. So this is the kind of thing compilers need to sort out before doing code-gen, not just evaluate *x
with a mov
load. Because then what? Having the value in a register doesn't help you take the address of the original location.
Nobody expects truly efficient code from -O0
(part of the goal is to compile fast, as well as consistent debugging), but gratuitous random extra instructions would be unwelcome even in cases where they're not dangerous.
GCC actually transforms source through GIMPLE and RTL internal representations of the program logic. It's probably during those passes where different C ways of expressing the same logic tend to become identical.
That said, it's somewhat surprising that gcc does lea rax, [rbp-80]
/ add rax, 4
instead of folding the + 1*sizeof(unsigned)
into the LEA. It would of course do that if you used optimization. (and volatile unsigned int*
to force it to still materialize the unused variables, if you want it to work without the code bloat of the printf calls.)
Other compilers:
MSVC does have some differences: https://godbolt.org/z/xoMfT4
;; x86-64 MSVC
sub rsp, 88 ; Windows x64 doesn't have a red zone
...
// unsigned int* a = &x[1]; // Get address of dereferenced x[1]
mov eax, 4 ; even dumber than GCC
imul rax, rax, 1 ; sizeof(unsigned) * 1 I guess?
lea rax, QWORD PTR x$[rsp+rax]
mov QWORD PTR a$[rsp], rax
// unsigned int* b = &(*(x+1)); // Get address of dereferenced *(x+1)
lea rax, QWORD PTR x$[rsp+4] ; smarter than GCC
mov QWORD PTR b$[rsp], rax
// unsigned int* c = x+1; // Get address x+1
lea rax, QWORD PTR x$[rsp+4]
mov QWORD PTR c$[rsp], rax
...
c$[rsp]
is just [16 + rsp]
, given the c$ = 16
assemble-time constant it defined earlier.
ICC and clang compile all versions the same way.
MSVC for AArch64 avoids the multiply (and uses hex literals instead of decimal). But like x86-64 GCC, it gets the array base address into a register and then adds 4. https://godbolt.org/z/ThPxx9
@@ AArch64 MSVC
...
sub sp,sp,#0x40
...
// unsigned int* a = &x[1]; // Get address of dereferenced x[1]
add x8,sp,#0x20
add x8,x8,#4
str x8,[sp]
// unsigned int* b = &(*(x+1)); // Get address of dereferenced *(x+1)
add x8,sp,#0x20
add x8,x8,#4
str x8,[sp,#8]
// unsigned int* c = x+1; // Get address x+1
add x8,sp,#0x20
add x8,x8,#4
str x8,[sp,#0x10]
// unsigned int* d = &1[x];
add x8,sp,#0x20
add x8,x8,#4
str x8,[sp,#0x18]
Clang uses the interesting strategy of getting the array base address into a register once, and adding to it for each statement. I guess it considers that x86-64 lea
or AArch64 add x9, sp, #36
part of its prologue, if it wants to support debuggers that use jump
between source lines, and maybe won't do if it there's any non-linear control-flow in the function?
回答2:
Those three are all defined to be equivalent by the Standard:
- It explicitly has a statement that
&*(X)
is exactly identical to(X)
in all cases A[B]
is defined as*(A+B)
.
Combining the second rule with the first one, we get &(A[B])
being identical to (A+B)
.
In general, you will notice a bunch of other "optimizations" occur as well.
C is defined in terms of the output of an abstract machine. All programs which produce the same output are equivalent programs in the eyes of the standard.
The different optimization levels offered by a compiler cater to debuggability and compilation size/speed considerations , they aren't some intrinsic levels of the language or anything.
来源:https://stackoverflow.com/questions/65713833/generated-assembly-for-pointer-arithmetic