Below are two programs that are almost identical except that I switched the i
and j
variables around. They both run in different amounts of time. C
Besides the other excellent answers on cache hits, there is also a possible optimization difference. Your second loop is likely to be optimized by the compiler into something equivalent to:
for (j=0; j<4000; j++) {
int *p = x[j];
for (i=0; i<4000; i++) {
*p++ = i+j;
}
}
This is less likely for the first loop, because it would need to increment the pointer "p" with 4000 each time.
EDIT: p++
and even *p++ = ..
can be compiled to a single CPU instruction in most CPU's. *p = ..; p += 4000
cannot, so there is less benefit in optimising it. It's also more difficult, because the compiler needs to know and use the size of the inner array. And it does not occur that often in the inner loop in normal code (it occurs only for multidimensional arrays, where the last index is kept constant in the loop, and the second to last one is stepped), so optimisation is less of a priority.