C++ Optimize if/else condition

后端 未结 5 678
耶瑟儿~
耶瑟儿~ 2021-02-07 05:02

I have a single line of code, that consumes 25% - 30% of the runtime of my application. It is a less-than comparator for an std::set (the set is implemented with a Red-Black-Tre

5条回答
  •  囚心锁ツ
    2021-02-07 05:43

    Let me preface this with the fact that what I'm going to outline here is fragile and not entirely portable -- but under the right circumstances (which are pretty much what you've specified) I'm reasonably certain that it should work correctly.

    One point it depends upon is the fact that IEEE floating point numbers are carefully designed so that if you treat their bit pattern as an integer, they'll still sort into the correct order (modulo a few things like NaNs, for which there really is no "correct order").

    To make use of that, what we do is pack the Entry so there's no padding between the two pieces that make up our key. Then we ensure the structure as a whole is aligned to an 8-byte boundary. I've also changed the _id to int32_t to ensure that it stays 32 bits, even on a 64-bit system/compiler (which will almost certainly produce the best code for this comparison).

    Then, we cast the address of the structure so we can view the floating point number and the integer together as a single 64-bit integer. Since you're using a little-endian processor, to support that we need to put the less significant part (the id) first, and the more significant part (the cost) second, so when we treat them as a 64-bit integer, the floating point part will become the most significant bits, and the integer part the less significant bits:

    struct __attribute__ ((__packed__)) __attribute__((aligned(8)) Entry {
      // Do *not* reorder the following two fields or comparison will break.
      const int32_t _id;
      const float _cost;
    
      // some other vars
    
        Entry(long id, float cost) : _cost(cost), _id(id) {} 
    };
    

    Then we have our ugly little comparison function:

    bool operator<(Entry const &a, Entry const &b) { 
       return *(int64_t const *)&a < *(int64_t const *)&b;
    }
    

    Once we've defined the struct correctly, the comparison becomes fairly straightforward: just take the first 64 bits of each struct, and compare them as if they were 64-bit integers.

    Finally a bit of test code to give at least a little assurance that it works correctly for some values:

    int main() { 
        Entry a(1236, 1.234f), b(1234, 1.235f), c(1235, 1.235f);
    
        std::cout << std::boolalpha;
    
        std::cout << (b

    At least for me, that produces the expected results:

    false
    true
    true
    false
    

    Now, some of the possible problems: if the two items get rearranged either between themselves, or any other part of the struct gets put before or between them, comparison will definitely break. Second, we're completely dependent on the sizes of the items remaining 32 bits apiece, so when they're concatenated they'll be 64 bits. Third, if somebody removes the __packed__ attribute from the struct definition, we could end up with padding between _id and _cost, again breaking the comparison. Likewise, if somebody removes the aligned(8), the code may lose some speed, because it's trying to load 8-byte quantities that aren't aligned to 8-byte boundaries (and on another processor, this might fail completely). [Edit: Oops. @rici reminded me of something I intended to list here, but forgot: this only works correctly when both the _id and cost are positive. If _cost is negative, comparisons will be messed up by the fact that IEEE floating point used a signed magnitude representation. If an _id is negative, its sign bit will be treated just like a normal bit in the middle of a number, so a negative _id will show up as larger than a positive _id.]

    To summarize: this is fragile. No question at all about that. Nonetheless, it should be pretty fast -- especially if you're using a 64-bit compiler, in which case I'd expect the comparison to come out to two loads and one comparison. To make a long story short, you're at the point that you probably can't make the comparison itself any faster at all -- all you can do is try to do more in parallel, optimize memory usage patterns, etc.

提交回复
热议问题