发表新帖

发表新帖

Why is there a significant difference in this C++ for loop's execution time? [duplicate]

后端未结

关注

 8  1299

北海茫月 2021-01-30 19:48

8条回答

小鲜肉 (楼主)

2021-01-30 20:30
Other people have done a good job explaining why one form of your code makes more efficient use of the memory cache than the other. I'd like to add some background information you may not be aware of: you probably don't realize just how expensive main memory accesses are nowadays.

The numbers posted in this question look to be in the right ballpark to me, and I'm going to reproduce them here because they're so important:
```
Core i7 Xeon 5500 Series Data Source Latency (approximate)
L1 CACHE hit, ~4 cycles
L2 CACHE hit, ~10 cycles
L3 CACHE hit, line unshared ~40 cycles
L3 CACHE hit, shared line in another core ~65 cycles
L3 CACHE hit, modified in another core ~75 cycles remote
remote L3 CACHE ~100-300 cycles
Local Dram ~60 ns
Remote Dram ~100 ns
```
Note the change in units for the last two entries. Depending exactly which model you have, this processor runs at 2.9–3.2 GHz; to make the math simpler, let's just call it 3 GHz. So one cycle is 0.33333 nanoseconds. So DRAM accesses are also 100-300 cycles.

The point is that the CPU could have executed hundreds of instructions in the time it takes to read one cache line from main memory. This is called the memory wall. Because of it, efficient use of the memory cache is more important than any other factor in overall performance on modern CPUs.
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

热议问题