发表新帖

发表新帖

Why is there a significant difference in this C++ for loop's execution time? [duplicate]

后端未结

关注

 8  1322

北海茫月 2021-01-30 19:48

8条回答

囚心锁ツ (楼主)

2021-01-30 20:51

Memory hardware is not optimized to deliver individual addresses: instead it tends to operate on larger chunks of continuous memory called cache lines. Every time you read one entry of your matrix, the entire cache line it lies in also gets loaded into cache along with it.

The faster loop ordering is set up to read memory in order; each time you load a cache line, you use all of the entries in that cache line. Each pass through the outer loop, you read each matrix entry only a single time.

The slower loop ordering, however, only uses a single entry from each cache line before moving on. Thus, each cache line has to get loaded multiple times, once for each matrix entry in the line. e.g. if a double is 8 byes and a cache line is 64 bytes long, then each pass through the outer loop has to read each matrix entry eight times rather than a single time.

All that said, if you had turned optimizations on, you would probably see no difference: optimizers understand this phenomenon, and good ones are capable of recognizing that they can swap which loop is the inner loop and which loop is the outer loop for this particular code snippet.

(also, a good optimizer would have only done one pass through the outermost loop, because it recognizes the first 999 times through are irrelevant to the final value of sum)

0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

热议问题