How would you generically detect cache line associativity from user mode code?

前端 未结 3 865
暗喜
暗喜 2021-02-05 10:17

I\'m putting together a small patch for the cachegrind/callgrind tool in valgrind which will auto-detect, using completely generic code, CPU instruction and cache configuration

相关标签:
3条回答
  • 2021-02-05 10:57

    For x86 platform you can use cpuid:

    See http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html for details.

    You need something like:

    long _eax,_ebx,_ecx,_edx;
    long op = func;
    
    asm ("cpuid"
        : "=a" (_eax),
        "=b" (_ebx),
        "=c" (_ecx),
        "=d" (_edx)
        : "a" (op)
    );
    

    Then use the info according to the doc in the link mentioned above.

    0 讨论(0)
  • 2021-02-05 11:23

    Here's a scheme:

    Have a memory access pattern with a stride S , and number of unique elements accessed = N. The test first touches each unique element, and then measures the average time to access each element, by accessing the same pattern a very large number of times.

    Example: for S = 2 and N = 4 the address pattern would be 0,2,4,6,0,2,4,6,0,2,4,6,...

    Consider a multi-level cache hierarchy. You can make the following reasonable assumptions:

    • Size of n+1 th level-cache is a power of two times the size of the nth cache
    • The associativity of n+1 th cache is also a power of two times the associativity of the nth cache.

    These 2 assumptions allow us to say that if two addresses map to the same set in n+1 th cache(say L2), then they must map to the same set in nth cache(say L1).

    Say you know the sizes of L1, L2 caches. You need to find the associativity of L2 cache.

    • set stride S = size of L2 cache (so that every access maps to the same set in L2, and in L1 too)
    • vary N (by powers of 2)

    You get the following regimes:

    • Regime 1: N <= associativity of L1. (All accesses HIT in L1)
    • Regime 2: associativity of L1 < N <= associativity of L2 (All accesses miss in L1, but HIT in L2)
    • Regime 3: N > associativity of L2 ( All accesses miss in L2)

    So, if you plot average access time against N (when S = size of L2), you will see a step-like plot. The end of the lowest step gives you the associativity of L1. The next step gives you the associativity of L2.

    You can repeat the same procedure between L2-L3 and so-on. Please let me know if that helps. The method of obtaining cache parameters by varying the stride of a memory access pattern is similar to that used by the LMBENCH benchmark. I don't know if lmbench infers associativity too.

    0 讨论(0)
  • 2021-02-05 11:23

    Could you do a small program that only accesses lines from the same set? Then you can increase the stack distance between the accesses and when the execution time dramatically fall, you can assume you have reach the associativity.

    It's probably not very stable, but maybe that could give a lead, don't know. I hope it can help.

    0 讨论(0)
提交回复
热议问题