Calculating the speed of routines?

What would be the best and most accurate way to determine how long it took to process a routine, such as a procedure of function?

I ask because I am currently trying to optimize a few functions in my Application, when i test the changes it is hard to determine just by looking at it if there was any improvements at all. So if I could return an accurate or near accurate time it took to process a routine, I then have a more clear idea of how well, if any changes to the code have been made.

I considered using GetTickCount, but I am unsure if this would be anything near accurate?

It would be useful to have a resuable function/procedure to calculate the time of a routine, and use it something like this:

// < prepare for calcuation of code
...
ExecuteSomeCode; // < code to test
...
// < stop calcuating code and return time it took to process

I look forward to hearing some suggestions.

Thanks.

Craig.

From my knowledge, the most accurate method is by using QueryPerformanceFrequency:

code:

var
  Freq, StartCount, StopCount: Int64;
  TimingSeconds: real;
begin
  QueryPerformanceFrequency(Freq);
  QueryPerformanceCounter(StartCount);
  // Execute process that you want to time: ...
  QueryPerformanceCounter(StopCount);
  TimingSeconds := (StopCount - StartCount) / Freq;
  // Display timing: ... 
end;

Try Eric Grange's Sampling Profiler.

From Delphi 6 upwards you can use the x86 Timestamp counter.
This counts CPU cycles, on a 1 Ghz processor, each count takes one nanosecond.
Can't get more accurate than that.

function RDTSC: Int64; assembler;
asm
  // RDTSC can be executed out of order, so the pipeline needs to be flushed
  // to prevent RDTSC from executing before your code is finished.  
  // Flush the pipeline
  XOR eax, eax
  PUSH EBX
  CPUID
  POP EBX
  RDTSC  //Get the CPU's time stamp counter.
end;

On x64 the following code is more accurate, because it does not suffer from the delay of CPUID.

  rdtscp        // On x64 we can use the serializing version of RDTSC
  push rbx      // Serialize the code after, to avoid OoO sneaking in
  push rax      // subsequent instructions prior to executing RDTSCP.
  push rdx      // See: http://www.intel.de/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
  xor eax,eax
  cpuid
  pop rdx
  pop rax
  pop rbx
  shl rdx,32
  or rax,rdx

Use the above code to get the timestamp before and after executing your code.
Most accurate method possible and easy as pie.

Note that you need to run a test at least 10 times to get a good result, on the first pass the cache will be cold, and random harddisk reads and interrupts can throw off your timings.
Because this thing is so accurate it can give you the wrong idea if you only time the first run.

Why you should not use QueryPerformanceCounter()
QueryPerformanceCounter() gives the same amount of time if the CPU slows down, it compensates for CPU thottling. Whilst RDTSC will give you the same amount of cycles if your CPU slows down due to overheating or whatnot.
So if your CPU starts running hot and needs to throttle down, QueryPerformanceCounter() will say that your routine is taking more time (which is misleading) and RDTSC will say that it takes the same amount of cycles (which is accurate).
This is what you want because you're interested in the amount of CPU-cycles your code uses, not the wall-clock time.

From the lastest intel docs: http://software.intel.com/en-us/articles/measure-code-sections-using-the-enhanced-timer/?wapkw=%28rdtsc%29

Using the Processor Clocks

This timer is very accurate. On a system with a 3GHz processor, this timer can measure events that last less than one nanosecond. [...] If the frequency changes while the targeted code is running, the final reading will be redundant since the initial and final readings were not taken using the same clock frequency. The number of clock ticks that occurred during this time will be accurate, but the elapsed time will be an unknown.

When not to use RDTSC
RDTSC is useful for basic timing. If you're timing multithreaded code on a single CPU machine, RDTSC will work fine. If you have multiple CPU's the startcount may come from one CPU and the endcount from another.
So don't use RDTSC to time multithreaded code on a multi-CPU machine. On a single CPU machine it works fine, or single threaded code on a multi-CPU machine it is also fine.
Also remember that RDTSC counts CPU cycles. If there is something that takes time but doesn't use the CPU, like disk-IO or network than RDTSC is not a good tool.

But the documentation says RDTSC is not accurate on modern CPU's
RDTSC is not a tool for keeping track of time, it's a tool for keeping track of CPU-cycles.
For that it is the only tool that is accurate. Routines that keep track of time are not accurate on modern CPU's because the CPU-clock is not absolute like it used to be.

You didn't specify your Delphi version, but Delphi XE has a TStopWatch declared in unit Diagnostics. This will allow you to measure the runtime with reasonable precision.

uses
  Diagnostics;
var
  sw: TStopWatch;
begin
  sw := TStopWatch.StartNew;
  <dosomething>
  Writeln(Format('runtime: %d ms', [sw.ElapsedMilliseconds]));
end;

Mike Dunlavey

I ask because I am currently trying to optimize a few functions

It is natural to think that measuring is how you find out what to optimize, but there's a better way.

If something takes a large enough fraction of time (F) to be worth optimizing, then if you simply pause it at random, F is the probability you will catch it in the act. Do that several times, and you will see precisely why it's doing it, down to the exact lines of code.

More on that. Here's an example.

Fix it, and then do an overall measurement to see how much you saved, which should be about F. Rinse and repeat.

Here are some procedures I made to handle checking the duration of a function. I stuck them in a unit I called uTesting and then just throw into the uses clause during my testing.

Declaration

  Procedure TST_StartTiming(Index : Integer = 1);
    //Starts the timer by storing now in Time
    //Index is the index of the timer to use. 100 are available

  Procedure TST_StopTiming(Index : Integer = 1;Display : Boolean = True; DisplaySM : Boolean = False);
    //Stops the timer and stores the difference between time and now into time
    //Displays the result if Display is true
    //Index is the index of the timer to use. 100 are available

  Procedure TST_ShowTime(Index : Integer = 1;Detail : Boolean = True; DisplaySM : Boolean = False);
    //In a ShowMessage displays time
    //Uses DateTimeToStr if Detail is false else it breaks it down (H,M,S,MS)
    //Index is the index of the timer to use. 100 are available

variables declared

var
  Time : array[1..100] of TDateTime;

Implementation

  Procedure TST_StartTiming(Index : Integer = 1);
  begin
    Time[Index] := Now;
  end; 

  Procedure TST_StopTiming(Index : Integer = 1;Display : Boolean = True; DisplaySM : Boolean = False);
  begin
    Time[Index] := Now - Time[Index];
    if Display then TST_ShowTime;
  end;

  Procedure TST_ShowTime(Index : Integer = 1;Detail : Boolean = True; DisplaySM : Boolean = False);
  var
    H,M,S,MS : Word;
  begin
    if Detail then
      begin
        DecodeTime(Time[Index],H,M,S,MS);
        if DisplaySM then
        ShowMessage('Hour   =   ' + FloatToStr(H)  + #13#10 +
                    'Min     =   ' + FloatToStr(M)  + #13#10 +
                    'Sec      =   ' + FloatToStr(S)  + #13#10 +
                    'MS      =   ' + FloatToStr(MS) + #13#10)
        else
        OutputDebugString(PChar('Hour   =   ' + FloatToStr(H)  + #13#10 +
                    'Min     =   ' + FloatToStr(M)  + #13#10 +
                    'Sec      =   ' + FloatToStr(S)  + #13#10 +
                    'MS      =   ' + FloatToStr(MS) + #13#10));
      end
    else
      ShowMessage(TimeToStr(Time[Index]));
      OutputDebugString(Pchar(TimeToStr(Time[Index])));
  end;

Use this http://delphi.about.com/od/windowsshellapi/a/delphi-high-performance-timer-tstopwatch.htm

clock_gettime() is the high solution, which is precise to nano seconds, you can also use rtdsc, which is precise to CPU cycle, and lastly you can simply use gettimeofday().

来源：https://stackoverflow.com/questions/6030586/calculating-the-speed-of-routines

标签

performance

Delphi

optimization

gettickcount