Below are two programs that are almost identical except that I switched the i
and j
variables around. They both run in different amounts of time. C
I try to give a generic answer.
Because i[y][x]
is a shorthand for *(i + y*array_width + x)
in C (try out the classy int P[3]; 0[P] = 0xBEEF;
).
As you iterate over y
, you iterate over chunks of size array_width * sizeof(array_element)
. If you have that in your inner loop, then you will have array_width * array_height
iterations over those chunks.
By flipping the order, you will have only array_height
chunk-iterations, and between any chunk-iteration, you will have array_width
iterations of only sizeof(array_element)
.
While on really old x86-CPUs this did not matter much, nowadays' x86 do a lot of prefetching and caching of data. You probably produce many cache misses in your slower iteration-order.