clflush not flushing the instruction cache

前端 未结 2 1381
予麋鹿
予麋鹿 2021-02-06 18:01

Consider the following code segment:

#include 
#include 
#include 
#define ARRAYSIZE(arr) (sizeof(arr)/sizeof(arr[         


        
2条回答
  •  挽巷
    挽巷 (楼主)
    2021-02-06 18:51

    Your code does almost nothing in func, and the little you do gets inlined into test, and probably optimized out since you never use the return value.

    gcc -O3 gives me -

    0000000000400620 :
      400620:       53                      push   %rbx
      400621:       0f a2                   cpuid
      400623:       0f 31                   rdtsc
      400625:       48 89 d7                mov    %rdx,%rdi
      400628:       48 89 c6                mov    %rax,%rsi
      40062b:       0f a2                   cpuid
      40062d:       0f 31                   rdtsc
      40062f:       5b                      pop    %rbx
      ...
    

    So you're measuring time for the two moves that are very cheap HW-wise - your measurement is probably showing the latency of cpuid which is relatively expensive..

    Worse, your clflush would actually flush test as well, this means you pay the re-fetch penalty when you next access it, which is out of the rdtsc pair so it's not measured. The measured code on the other hand, sequentially follows, so fetching test would probably also fetch the flushed code you measure, so it could actually be cached by the time you measure it.

提交回复
热议问题