I\'m investigating ways to speed up a large section of C++ code, which has automatic derivatives for computing jacobians. This involves doing some amount of work in the actual r
One way to force a compiler to optimize multiplications by 0's and 1`s is to manually unroll the loop. For simplicity let's use
#include
#include
constexpr std::size_t n = 12;
using Array = std::array;
Then we can implement a simple dot
function using fold expressions (or recursion if they are not available):
template
double dot(const Array& x, const Array& y, std::index_sequence)
{
return ((x[is] * y[is]) + ...);
}
double dot(const Array& x, const Array& y)
{
return dot(x, y, std::make_index_sequence{});
}
Now let's take a look at your function
double test(const Array& b)
{
const Array a{1}; // = {1, 0, ...}
return dot(a, b);
}
With -ffast-math
gcc 8.2 produces:
test(std::array const&):
movsd xmm0, QWORD PTR [rdi]
ret
clang 6.0.0 goes along the same lines:
test(std::array const&): # @test(std::array const&)
movsd xmm0, qword ptr [rdi] # xmm0 = mem[0],zero
ret
For example, for
double test(const Array& b)
{
const Array a{1, 1}; // = {1, 1, 0...}
return dot(a, b);
}
we get
test(std::array const&):
movsd xmm0, QWORD PTR [rdi]
addsd xmm0, QWORD PTR [rdi+8]
ret
Addition. Clang unrolls a for (std::size_t i = 0; i < n; ++i) ...
loop without all these fold expressions tricks, gcc doesn't and needs some help.