For starters, your use of std::accumulate
is summing integers.
So you're probably paying the cost of converting each of the
floating point to integer before adding it. Try:
sum = std::accumulate( samples, end, 0.f );
and see if that doesn't make a difference.