I found some code that had \"optimization\" like this:
void somefunc(SomeStruct param){
float x = param.x; // param.x and x are both floats. supposedly this
Rule of thumb: it's not slow, unless profiler says it is. Let the compiler worry about micro-optimisations (they're pretty smart about them; after all, they've been doing it for years) and focus on the bigger picture.
You'd have to look at the compiled code on a particular implementation to be sure, but there's no reason in principle why your preferred code (using the struct members) should necessarily be any slower than the code you've shown (copying into variables and then using the variables).
someFunc
takes a struct by value, so it has its own local copy of that struct. The compiler is perfectly at liberty to apply exactly the same optimizations to the struct members, as it would apply to the float
variables. They're both automatic variables, and in both cases the "as-if" rule allows them to be stored in register(s) rather than in memory provided that the function produces the correct observable behavior.
This is unless of course you take a pointer to the struct and use it, in which case the values need to be written in memory somewhere, in the correct order, pointed to by the pointer. This starts to limit optimization, and other limits are introduced by the fact that if you pass around a pointer to an automatic variable, the compiler can no longer assume that the variable name is the only reference to that memory and hence the only way its contents can be modified. Having multiple references to the same object is called "aliasing", and does sometimes block optimizations that could be made if the object was somehow known not to be aliased.
Then again, if this is an issue, and the rest of the code in the function somehow does use a pointer to the struct, then of course you could be on dodgy ground copying the values into variables from the POV of correctness. So the claimed optimization is not quite so straightforward as it looks in that case.
Now, there may be particular compilers (or particular optimization levels) which fail to apply to structs all the optimizations that they're permitted to apply, but do apply equivalent optimizations to float variables. If so then the comment would be right, and that's why you have to check to be sure. For example, maybe compare the emitted code for this:
float somefunc(SomeStruct param){
float x = param.x; // param.x and x are both floats. supposedly this makes it faster access
float y = param.y;
float z = param.z;
for (int i = 0; i < 10; ++i) {
x += (y +i) * z;
}
return x;
}
with this:
float somefunc(SomeStruct param){
for (int i = 0; i < 10; ++i) {
param.x += (param.y +i) * param.z;
}
return param.x;
}
There may also be optimization levels where the extra variables make the code worse. I'm not sure I put much trust in code comments that say "supposedly this makes it faster access", sounds like the author doesn't really have a clear idea why it matters. "Apparently it makes it faster access - I don't know why but the tests to confirm this and to demonstrate that it makes a noticeable difference in the context of our program, are in source control in the following location" is a lot more like it ;-)
Compiler may make faster code to copy float-to-float.
But when x
will used it will be converted to internal FPU representation.
When you specify a "simple" variable (not a struct/class) to be operated upon, the system only has to go to that place and fetch the data it wants.
But when you refer to a variable inside a struct or class, like A.B
, the system needs to calculate where B
is inside that area called A
(because there may be other variables declared before it), and that calculation takes a bit more than the the more plain access described above.
In an unoptimised code:
Unoptimised access to local variables and function parameters in an assembly language look more-or-less like this:
mov %eax, %ebp+ compile-time-constant
where %ebp
is a frame pointer (sort of 'this' pointer for a function).
It makes no difference if you access a parameter or a local variable.
The fact that you are accessing an element from a struct makes absolutely no difference from the assembly/machine point of view. Structs are constructs made in C to make programmer's life easier.
So, ulitmately, my answer is: No, there is absolutely no benefit in doing that.
The usual rules for optimization (Michael A. Jackson) apply: 1. Don't do it. 2. (For experts only:) Don't do it yet.
That being said, let's assume it's the innermost loop that takes 80% of the time of a performance-critical application. Even then, I doubt you will ever see any difference. Let's use this piece of code for instance:
struct Xyz {
float x, y, z;
};
float f(Xyz param){
return param.x + param.y + param.z;
}
float g(Xyz param){
float x = param.x;
float y = param.y;
float z = param.z;
return x + y + z;
}
Running it through LLVM shows: Only with no optimizations, the two act as expected (g
copies the struct members into locals, then proceeds sums those; f
sums the values fetched from param
directly). With standard optimization levels, both result in identical code (extracting the values once, then summing them).
For short code, this "optimization" is actually harmful, as it copies the floats needlessly. For longer code using the members in several places, it might help a teensy bit if you actively tell your compiler to be stupid. A quick test with 65 (instead of 2) additions of the members/locals confirms this: With no optimizations, f
repeatedly loads the struct members while g
reuses the already extracted locals. The optimized versions are again identical and both extract the members only once. (Surprisingly, there's no strength reduction turning the additions into multiplications even with LTO enabled, but that just indicates the LLVM version used isn't optimizing too agressively anyway - so it should work just as well in other compilers.)
So, the bottom line is: Unless you know your code will have to be compiled by a compiler that's so outragously stupid and/or ancient that it won't optimize anything, you now have proof that the compiler will make both ways equivalent and can thus do away with this crime against readability and brewity commited in the name of performance. (Repeat the experiment for your particular compiler if necessary.)