How to find the size of the L1 cache line size with IO timing measurements?

后端未结

关注

 8  680

As a school assignment, I need to find a way to get the L1 data cache line size, without reading config files or using api calls. Supposed to use memory accesses read/write timi

相关标签:

8条回答

隐瞒了意图╮

2021-01-30 09:26

Allocate a BIG char array (make sure it is too big to fit in L1 or L2 cache). Fill it with random data.

Start walking over the array in steps of n bytes. Do something with the retrieved bytes, like summing them.

Benchmark and calculate how many bytes/second you can process with different values of n, starting from 1 and counting up to 1000 or so. Make sure that your benchmark prints out the calculated sum, so the compiler can't possibly optimize the benchmarked code away.

When n == your cache line size, each access will require reading a new line into the L1 cache. So the benchmark results should get slower quite sharply at that point.

If the array is big enough, by the time you reach the end, the data at the beginning of the array will already be out of cache again, which is what you want. So after you increment n and start again, the results will not be affected by having needed data already in the cache.

0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2021-01-30 09:32

I think you should write program, that will walk throught array in random order instead straight, because modern process do hardware prefetch. For example, make array of int, which values will number of next cell. I did similar program 1 year ago http://pastebin.com/9mFScs9Z Sorry for my engish, I am not native speaker.

0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2021-01-30 09:34

See how to memtest86 is implemented. They measure and analyze data transfer rate in some way. Points of rate changing is corresponded to size of L1, L2 and possible L3 cache size.

0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2021-01-30 09:35

If you get stuck in the mud and can't get out, look here.

There are manuals and code that explain how to do what you're asking. The code is pretty high quality as well. Look at "Subroutine library".

The code and manuals are based on X86 processors.

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤城傲影

2021-01-30 09:36

I think it should be enough to time an operation that uses some amount of memory. Then progresively increase the memory (operands for instance) used by the operation. When the operation performance severelly decreases you have found the limit.

I would go with just reading a bunch of bytes without printing them (printing would hit the performance so bad that would become a bottleneck). While reading, the timing should be directly proportinal to the ammount of bytes read until the data cannot fit the L1 anymore, then you will get the performance hit.

You should also allocate the memory once at the start of the program and before starting to count time.

0 讨论(0)
发布评论:

提交评论
- 加载中...
予麋鹿

2021-01-30 09:37

You can use the CPUID function in assembler, although non portable, it will give you what you want.

For Intel Microprocessors, the Cache Line Size can be calculated by multiplying bh by 8 after calling cpuid function 0x1.

For AMD Microprocessors, the data Cache Line Size is in cl and the instruction Cache Line Size is in dl after calling cpuid function 0x80000005.

I took this from this article here.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页