How to ensure that RDTSC is accurate?

╄→尐↘猪︶ㄣ 提交于 2019-12-04 14:31:47

Very old CPU's have a RDTSC that is accurate.

The problem
However newer CPU's have a problem.
Engineers decided that RDTSC would be great for telling time.
However if a CPU throttles the frequency RDTSC is useless for telling time.
The aforementioned braindead engineers then decided to 'fix' this problem by having the TSC always run at the same frequency, even if the CPU slows down.

This has the 'advantage' that TSC can be used for telling elapsed (wall clock) time. However it makes the TSC useless less useful for profiling.

How to tell if your CPU is not broken
You can tell if your CPU is fine by reading the TSC_invariant bit in the CPUID.

Set AEX to 80000007H and read bit 8 of EDX.
If it is 0 then your CPU is fine.
If it's 1 then your CPU is broken and you need to make sure you profile whilst running the CPU at full throttle.

function IsTimerBroken: boolean;
{$ifdef CPUX86}
asm
  //Make sure RDTSC measure CPU cycles, not wall clock time.
  push ebx
  mov eax,$80000007  //Has TSC Invariant support?
  cpuid
  pop ebx
  xor eax,eax        //Assume no
  and edx,$10        //test TSC_invariant bit
  setnz al           //if set, return true, your PC is broken.
end;
{$endif}
  //Make sure RDTSC measure CPU cycles, not wall clock time.
{$ifdef CPUX64}
asm
  mov r8,rbx
  mov eax,$80000007  //TSC Invariant support?
  cpuid
  mov rbx,r8
  xor eax,eax
  and edx,$10 //test bit 8
  setnz al
end;
{$endif}

How to fix out of order execution issues
See: http://www.intel.de/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf

Use the following code:

function RDTSC: int64;
{$IFDEF CPUX64}
asm
  {$IFDEF AllowOutOfOrder}
  rdtsc
  {$ELSE}
  rdtscp        // On x64 we can use the serializing version of RDTSC
  push rbx      // Serialize the code after, to avoid OoO sneaking in
  push rax      // later instructions before the RDTSCP runs.
  push rdx      // See: http://www.intel.de/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
  xor eax,eax
  cpuid
  pop rdx
  pop rax
  pop rbx
  {$ENDIF}
  shl rdx,32
  or rax,rdx
  {$ELSE}
{$IFDEF CPUX86}
asm
  {$IFNDEF AllowOutOfOrder}
  xor eax,eax
  push ebx
  cpuid         // On x86 we can't assume the existance of RDTSP
  pop ebx       // so use CPUID to serialize
  {$ENDIF}
  rdtsc
  {$ELSE}
error!
{$ENDIF}
{$ENDIF}
end;

How to run RDTSC on a broken CPU
The trick is to force the CPU to run at 100%.
This is usually done by running the sample code many many times.
I usually use 1.000.000 to start with.
I then time those 1 million runs 10x and take the lowest time of those attempts.

Comparisons with theoretical timings show that this gives very accurate results.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!