How can the following simple implementation of sum
be faster?
private long sum( int [] a, int begin, int end ) {
if( a == null ) {
ret
This code is simple enough that unless a
is quite small, it's probably going to be limited primarily by memory bandwidth. As such, you probably can't hope for any significant gain by working on the summing part itself (e.g., unrolling the loop, counting down instead of up, executing sums in parallel -- unless they're on separate CPUs, each with its own access to memory). The biggest gain will probably come from issuing some preload instructions so most of the data will already be in the cache by the time you need it. The rest will just (at best) get the CPU to hurry up more, so it waits longer.
Edit: It appears that most of what's above has little to do with the real question. It's kind of small, so it may be difficult to read, but, I tried just using std::accumulate()
for the initial addition, and it seemed to think that was all right: