Why don't C++ compilers do better constant folding?

后端 未结 3 692
迷失自我
迷失自我 2021-01-30 16:18

I\'m investigating ways to speed up a large section of C++ code, which has automatic derivatives for computing jacobians. This involves doing some amount of work in the actual r

3条回答
  •  礼貌的吻别
    2021-01-30 16:40

    One way to force a compiler to optimize multiplications by 0's and 1`s is to manually unroll the loop. For simplicity let's use

    #include 
    #include 
    constexpr std::size_t n = 12;
    using Array = std::array;
    

    Then we can implement a simple dot function using fold expressions (or recursion if they are not available):

    
    template
    double dot(const Array& x, const Array& y, std::index_sequence)
    {
        return ((x[is] * y[is]) + ...);
    }
    
    double dot(const Array& x, const Array& y)
    {
        return dot(x, y, std::make_index_sequence{});
    }
    

    Now let's take a look at your function

    double test(const Array& b)
    {
        const Array a{1};    // = {1, 0, ...}
        return dot(a, b);
    }
    

    With -ffast-math gcc 8.2 produces:

    test(std::array const&):
      movsd xmm0, QWORD PTR [rdi]
      ret
    

    clang 6.0.0 goes along the same lines:

    test(std::array const&): # @test(std::array const&)
      movsd xmm0, qword ptr [rdi] # xmm0 = mem[0],zero
      ret
    

    For example, for

    double test(const Array& b)
    {
        const Array a{1, 1};    // = {1, 1, 0...}
        return dot(a, b);
    }
    

    we get

    test(std::array const&):
      movsd xmm0, QWORD PTR [rdi]
      addsd xmm0, QWORD PTR [rdi+8]
      ret
    

    Addition. Clang unrolls a for (std::size_t i = 0; i < n; ++i) ... loop without all these fold expressions tricks, gcc doesn't and needs some help.

提交回复
热议问题