c++ nested loop performance [closed]

问题

I have basically two vectors one for a large number of elements and a second for a small number of probes used to sample data of the elements. I stumbled upon the question in which order to implement the two loops. Naturally I thought having the outer loop over the larger vector would be beneficially

Implementation 1:

for(auto& elem: elements) {
    for(auto& probe: probes) {
        probe.insertParticleData(elem);
    }
}

However it seems that the second implementation takes only half of the time

Implementation 2:

for(auto& probe: probes) {
    for(auto& elem: elements) {
        probe.insertParticleData(elem);
    }
}

What could be the reason for that?

Edit:

Timings were generated by the following code

clock_t t_begin_ps = std::clock();
... // timed code
clock_t t_end_ps = std::clock();
double elapsed_secs_ps = double(t_end_ps - t_begin_ps) / CLOCKS_PER_SEC;

and on inserting the elements data I do basically two things, testing if the distance to the probe is below a limit and the computing an average

probe::insertParticleData (const elem& pP) {
   if (!isInside(pP.position())) {return false;}
   ... // compute alpha and beta
   avg_vel = alpha*avg_vel + beta*pP.getVel();
   return true;
}

To get an idea of the memory usage I have approx. 10k elements which are objects with 30 double data members. For the test I used 10 probes containing 15 doubles.

回答1:

Todays CPUs are heavily optimized for linear access to memory. Therefore a few long loops will beat many short loops. You want the inner loop to iterate over the long vector.

回答2:

My guess: if insertParticleData is virtual, the compiler will treat the function's address as a constant within the inner loop and move the vtable fetch outside the inner loop. I.e. effectively generate code which looks like:

   for (auto& probe: probes) {
      funcPtr p = probe.insertParticleData;
      for (auto& elem: elements) {
        (*p)(elem);
      }
   }

whereas in the first version, p would be computed for every inner iteration.

来源：https://stackoverflow.com/questions/27143919/c-nested-loop-performance

标签

c++

performance

loops