I found some code that had \"optimization\" like this:
void somefunc(SomeStruct param){
float x = param.x; // param.x and x are both floats. supposedly this
There are good and valid reasons to do that kind of optimization when pointers are used, because consuming all inputs first frees the compiler from possible aliasing issues which prevent it from producing optimal code (there's restrict nowadays too, though).
For non-pointer types, there is in theory an overhead because every member is accessed via the struct's this pointer. This may in theory be noticeable within an inner loop and will in theory be a diminuitive overhead otherwise.
In practice, however, a modern compiler will almost always (unless there is a complex inheritance hierarchy) produce the exact same binary code.
I had asked myself the exact same question as you did about two years ago and did a very extensive test case using gcc 4.4. My findings were that unless you really try to throw sticks between the compiler's legs on purpose, there is absolutely no difference in the generated code.
The real answer is given by Piotr. This one is just for fun.
I have tested it. This code:
float somefunc(SomeStruct param, float &sum){
float x = param.x;
float y = param.y;
float z = param.z;
float xyz = x * y * z;
sum = x + y + z;
return xyz;
}
And this code:
float somefunc(SomeStruct param, float &sum){
float xyz = param.x * param.y * param.z;
sum = param.x + param.y + param.z;
return xyz;
}
Generate identical assembly code when compiled with g++ -O2
. They do generate different code with optimization turned off, though. Here is the difference:
< movl -32(%rbp), %eax
< movl %eax, -4(%rbp)
< movl -28(%rbp), %eax
< movl %eax, -8(%rbp)
< movl -24(%rbp), %eax
< movl %eax, -12(%rbp)
< movss -4(%rbp), %xmm0
< mulss -8(%rbp), %xmm0
< mulss -12(%rbp), %xmm0
< movss %xmm0, -16(%rbp)
< movss -4(%rbp), %xmm0
< addss -8(%rbp), %xmm0
< addss -12(%rbp), %xmm0
---
> movss -32(%rbp), %xmm1
> movss -28(%rbp), %xmm0
> mulss %xmm1, %xmm0
> movss -24(%rbp), %xmm1
> mulss %xmm1, %xmm0
> movss %xmm0, -4(%rbp)
> movss -32(%rbp), %xmm1
> movss -28(%rbp), %xmm0
> addss %xmm1, %xmm0
> movss -24(%rbp), %xmm1
> addss %xmm1, %xmm0
The lines marked <
correspond to the version with "optimization" variables. It seems to me that the "optimized" version is even slower than the one with no extra variables. This is to be expected, though, as x, y and z are allocated on the stack, exactly like the param. What's the point of allocating more stack variables to duplicate existing ones?
If the one who did that "optimization" knew the language better, he would probably have declared those variables as register
, but even that leaves the "optimized" version slightly slower and longer, at least on G++/x86-64.
I'm no compiler guru, so take this with a grain of salt. I'm guessing that the original author of the code is assuming that by copying the values from the struct into local variables, the compiler has "placed" those variables into floating point registers which are available on some platforms (e.g., x86). If there aren't enough registers to go around, they'd be put in the stack.
That being said, unless this code was in the middle of an intensive computation/loop, I'd strive for clarity rather than speed. It's pretty rare that anyone is going to notice a few instructions difference in timing.