Consider the following code:
#include
#include
using Time = std::chrono::high_resolution_clock;
using us = std::chrono::microsec
In you original experiment, there are too many variables than can affect the measurements:
I must admit that I was very skeptical about your observations. I therefore wrote a small variant using a preallocated vector, to avoid I/O synchronisation effects:
volatile int i, k;
const int n = 1000000, kmax=200,n_avg=30;
std::vector v(kmax,0);
for(k = 0; k < kmax; ++k) {
auto begin = Time::now();
for (i = 0; i < n; ++i); // <-- remain thanks to volatile
auto end = Time::now();
auto dur = std::chrono::duration_cast(end - begin).count();
v[k]=dur;
}
I then ran it several times on ideone (which, given the scale of its use, we can assume that in average the processor whould be in a constantly sollicitated state). Indeed your observations seemed to be confirmed.
I guess that this could be related to branch prediction, which should improve through the repetitive patterns.
I however went on, updated the code slightly and added a loop to repeat the experiment several times. Then I started to get also runs where your observation was not confirmed (i.e. at the end, the time was higher). But it may also be that the many other processes running on the ideone also influence the branch prediction in a different manner.
So in the end, to conclude anything would require a more cautious experiment, on a machine running this benchmark (and only it) a couple of hours.