Why don't C++ compilers do better constant folding?

后端 未结 3 694
迷失自我
迷失自我 2021-01-30 16:18

I\'m investigating ways to speed up a large section of C++ code, which has automatic derivatives for computing jacobians. This involves doing some amount of work in the actual r

3条回答
  •  再見小時候
    2021-01-30 16:29

    I was disappointed to find that, without fast-math enabled, neither GCC 8.2, Clang 6 or MSVC 19 were able to make any optimizations at all over the naive dot-product with a matrix full of 0s.

    They have no other choice unfortunately. Since IEEE floats have signed zeros, adding 0.0 is not an identity operation:

    -0.0 + 0.0 = 0.0 // Not -0.0!
    

    Similarly, multiplying by zero does not always yield zero:

    0.0 * Infinity = NaN // Not 0.0!
    

    So the compilers simply cannot perform these constant folds in the dot product while retaining IEEE float compliance - for all they know, your input might contain signed zeros and/or infinities.

    You will have to use -ffast-math to get these folds, but that may have undesired consequences. You can get more fine-grained control with specific flags (from http://gcc.gnu.org/wiki/FloatingPointMath). According to the above explanation, adding the following two flags should allow the constant folding:
    -ffinite-math-only, -fno-signed-zeros

    Indeed, you get the same assembly as with -ffast-math this way: https://godbolt.org/z/vGULLA. You only give up the signed zeros (probably irrelevant), NaNs and the infinities. Presumably, if you were to still produce them in your code, you would get undefined behavior, so weigh your options.


    As for why your example is not optimized better even with -ffast-math: That is on Eigen. Presumably they have vectorization on their matrix operations, which are much harder for compilers to see through. A simple loop is properly optimized with these options: https://godbolt.org/z/OppEhY

提交回复
热议问题