How do you interpret cachegrind output for caching misses?

折月煮酒 提交于 2019-12-03 11:16:37
Manuel Selva

According to the documentation cachegrind only simulate the first and the last level caches:

Cachegrind simulates how your program interacts with a machine's cache hierarchy and (optionally) branch predictor. It simulates a machine with independent first-level instruction and data caches (I1 and D1), backed by a unified second-level cache (L2). This exactly matches the configuration of many modern machines.

However, some modern machines have three or four levels of cache. For these machines (in the cases where Cachegrind can auto-detect the cache configuration) Cachegrind simulates the first-level and last-level caches. The reason for this choice is that the last-level cache has the most influence on runtime, as it masks accesses to main memory. Furthermore, the L1 caches often have low associativity, so simulating them can detect cases where the code interacts badly with this cache (eg. traversing a matrix column-wise with the row length being a power of 2).

What that means is that you can't get L2 information but only L1 and L3 in your case.

The first part of cachegrind's output reports information about L1 instructions cache. In all your example, the number of L1 instruction caches misses is insignifiant, the miss rate is always 0%. It means that all your programs fit in your L1 instruction cache.

The second part of the output reports information about L1 and LL (last level cache, L3 in your case) data caches. Using the D1 miss rate: information you should see which version of your matrix multiplication algorithm is "the most cache efficient"

The final part of cachegrind output summs up information about LL (last level cache, L3 in your case) for both instructions and data. It thus gives the number of memory accesses and the percentage of memory requests served by the cache.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!