C++: Structs slower to access than basic variables?

后端 未结 9 2426
被撕碎了的回忆
被撕碎了的回忆 2021-02-13 03:45

I found some code that had \"optimization\" like this:

void somefunc(SomeStruct param){
    float x = param.x; // param.x and x are both floats. supposedly this          


        
9条回答
  •  难免孤独
    2021-02-13 04:23

    The usual rules for optimization (Michael A. Jackson) apply: 1. Don't do it. 2. (For experts only:) Don't do it yet.

    That being said, let's assume it's the innermost loop that takes 80% of the time of a performance-critical application. Even then, I doubt you will ever see any difference. Let's use this piece of code for instance:

    struct Xyz {
        float x, y, z;
    };
    
    float f(Xyz param){
        return param.x + param.y + param.z;
    }
    
    float g(Xyz param){
        float x = param.x;
        float y = param.y;
        float z = param.z;
        return x + y + z;
    }
    

    Running it through LLVM shows: Only with no optimizations, the two act as expected (g copies the struct members into locals, then proceeds sums those; f sums the values fetched from param directly). With standard optimization levels, both result in identical code (extracting the values once, then summing them).

    For short code, this "optimization" is actually harmful, as it copies the floats needlessly. For longer code using the members in several places, it might help a teensy bit if you actively tell your compiler to be stupid. A quick test with 65 (instead of 2) additions of the members/locals confirms this: With no optimizations, f repeatedly loads the struct members while g reuses the already extracted locals. The optimized versions are again identical and both extract the members only once. (Surprisingly, there's no strength reduction turning the additions into multiplications even with LTO enabled, but that just indicates the LLVM version used isn't optimizing too agressively anyway - so it should work just as well in other compilers.)

    So, the bottom line is: Unless you know your code will have to be compiled by a compiler that's so outragously stupid and/or ancient that it won't optimize anything, you now have proof that the compiler will make both ways equivalent and can thus do away with this crime against readability and brewity commited in the name of performance. (Repeat the experiment for your particular compiler if necessary.)

提交回复
热议问题