How does lmbench measure L1 and L2 cache latencies using C? (cannot understand explanation in manual)

百般思念 提交于 2019-12-07 14:18:03

问题


I am trying to understand how lmbench measures latency for L1, L2 and main memory.

The man page for lat_mem_rd mentions the method, but it's not clear to me:

The benchmark runs as two nested loops. The outer loop is the stride size. The inner loop is the array size. For each array size, the benchmark creates a ring of pointers that point forward one stride. Traversing the array is done by

p = (char **)*p;

in a for loop (the over head of the for loop is not significant; the loop is an unrolled loop 1000 loads long). The loop stops after doing a million loads.

How do you "create a ring of pointers that point forward one stride" ? Wouldn't this mean that if the stride size was 128 Bytes, you would need to make a linked list with each node separated by exactly 128 Bytes from it's previous one? malloc just returns some random free piece of memory, so I don't see how that's possible in C. And in the piece of code, I would always get a segmentation fault. (tested it, and what is p supposed to be initialized with?)

There is a similar thread on SO(link) and the first answer discusses this, but it does not talk about how strided approach can be used with linked lists. I also looked at the source code itself (lat_mem_rd.c) but couldn't understand this from that either.

Any help is appreciated.


回答1:


You can allocate large chunk of memory and then arrange elements of the linked list within allocated block on any boundary you want.



来源:https://stackoverflow.com/questions/19899087/how-does-lmbench-measure-l1-and-l2-cache-latencies-using-c-cannot-understand-e

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!