I have the following code that writes a global array with zeros twice, once forward and once backward.
#include
#include
#inc
If you modify the second loop to be identical to the first the effect is the same, the second loop is faster:
int main() {
int i;
clock_t t = clock();
for(i = 0; i < SIZE; i++)
c[i] = 0;
t = clock() - t;
printf("%d\n\n", t);
t = clock();
for(i = 0; i < SIZE; i++)
c[i] = 0;
t = clock() - t;
printf("%d\n\n", t);
}
This is due to the first loop loading the information into the cache and that information being readily accessible during the second loop
Results of the above:
317841
277270
Edit: Leeor brings up a good point, c
does not fit in the cache. I have an Intel Core i7 processor: http://ark.intel.com/products/37147/Intel-Core-i7-920-Processor-8M-Cache-2_66-GHz-4_80-GTs-Intel-QPI
According to the link, this means the L3 cache is only 8 MB, or 8,388,608 bytes and c
is 100,000,000 bytes