What would be the best and most accurate way to determine how long it took to process a routine, such as a procedure of function?
I ask because I am currently trying to optimize a few functions in my Application, when i test the changes it is hard to determine just by looking at it if there was any improvements at all. So if I could return an accurate or near accurate time it took to process a routine, I then have a more clear idea of how well, if any changes to the code have been made.
I considered using GetTickCount, but I am unsure if this would be anything near accurate?
It would be useful to have a resuable function/procedure to calculate the time of a routine, and use it something like this:
// < prepare for calcuation of code
...
ExecuteSomeCode; // < code to test
...
// < stop calcuating code and return time it took to process
I look forward to hearing some suggestions.
Thanks.
Craig.
From my knowledge, the most accurate method is by using QueryPerformanceFrequency:
code:
var
Freq, StartCount, StopCount: Int64;
TimingSeconds: real;
begin
QueryPerformanceFrequency(Freq);
QueryPerformanceCounter(StartCount);
// Execute process that you want to time: ...
QueryPerformanceCounter(StopCount);
TimingSeconds := (StopCount - StartCount) / Freq;
// Display timing: ...
end;
Try Eric Grange's Sampling Profiler.
From Delphi 6 upwards you can use the x86 Timestamp counter.
This counts CPU cycles, on a 1 Ghz processor, each count takes one nanosecond.
Can't get more accurate than that.
function RDTSC: Int64; assembler;
asm
// RDTSC can be executed out of order, so the pipeline needs to be flushed
// to prevent RDTSC from executing before your code is finished.
// Flush the pipeline
XOR eax, eax
PUSH EBX
CPUID
POP EBX
RDTSC //Get the CPU's time stamp counter.
end;
On x64 the following code is more accurate, because it does not suffer from the delay of CPUID
.
rdtscp // On x64 we can use the serializing version of RDTSC
push rbx // Serialize the code after, to avoid OoO sneaking in
push rax // subsequent instructions prior to executing RDTSCP.
push rdx // See: http://www.intel.de/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf
xor eax,eax
cpuid
pop rdx
pop rax
pop rbx
shl rdx,32
or rax,rdx
Use the above code to get the timestamp before and after executing your code.
Most accurate method possible and easy as pie.
Note that you need to run a test at least 10 times to get a good result, on the first pass the cache will be cold, and random harddisk reads and interrupts can throw off your timings.
Because this thing is so accurate it can give you the wrong idea if you only time the first run.
Why you should not use QueryPerformanceCounter()QueryPerformanceCounter()
gives the same amount of time if the CPU slows down, it compensates for CPU thottling. Whilst RDTSC will give you the same amount of cycles if your CPU slows down due to overheating or whatnot.
So if your CPU starts running hot and needs to throttle down, QueryPerformanceCounter()
will say that your routine is taking more time (which is misleading) and RDTSC will say that it takes the same amount of cycles (which is accurate).
This is what you want because you're interested in the amount of CPU-cycles your code uses, not the wall-clock time.
From the lastest intel docs: http://software.intel.com/en-us/articles/measure-code-sections-using-the-enhanced-timer/?wapkw=%28rdtsc%29
Using the Processor Clocks
This timer is very accurate. On a system with a 3GHz processor, this timer can measure events that last less than one nanosecond. [...] If the frequency changes while the targeted code is running, the final reading will be redundant since the initial and final readings were not taken using the same clock frequency. The number of clock ticks that occurred during this time will be accurate, but the elapsed time will be an unknown.
When not to use RDTSC
RDTSC is useful for basic timing. If you're timing multithreaded code on a single CPU machine, RDTSC will work fine. If you have multiple CPU's the startcount may come from one CPU and the endcount from another.
So don't use RDTSC to time multithreaded code on a multi-CPU machine. On a single CPU machine it works fine, or single threaded code on a multi-CPU machine it is also fine.
Also remember that RDTSC counts CPU cycles. If there is something that takes time but doesn't use the CPU, like disk-IO or network than RDTSC is not a good tool.
But the documentation says RDTSC is not accurate on modern CPU's
RDTSC is not a tool for keeping track of time, it's a tool for keeping track of CPU-cycles.
For that it is the only tool that is accurate. Routines that keep track of time are not accurate on modern CPU's because the CPU-clock is not absolute like it used to be.
You didn't specify your Delphi version, but Delphi XE has a TStopWatch declared in unit Diagnostics. This will allow you to measure the runtime with reasonable precision.
uses
Diagnostics;
var
sw: TStopWatch;
begin
sw := TStopWatch.StartNew;
<dosomething>
Writeln(Format('runtime: %d ms', [sw.ElapsedMilliseconds]));
end;
I ask because I am currently trying to optimize a few functions
It is natural to think that measuring is how you find out what to optimize, but there's a better way.
If something takes a large enough fraction of time (F) to be worth optimizing, then if you simply pause it at random, F is the probability you will catch it in the act. Do that several times, and you will see precisely why it's doing it, down to the exact lines of code.
More on that. Here's an example.
Fix it, and then do an overall measurement to see how much you saved, which should be about F. Rinse and repeat.
Here are some procedures I made to handle checking the duration of a function. I stuck them in a unit I called uTesting
and then just throw into the uses clause during my testing.
Declaration
Procedure TST_StartTiming(Index : Integer = 1);
//Starts the timer by storing now in Time
//Index is the index of the timer to use. 100 are available
Procedure TST_StopTiming(Index : Integer = 1;Display : Boolean = True; DisplaySM : Boolean = False);
//Stops the timer and stores the difference between time and now into time
//Displays the result if Display is true
//Index is the index of the timer to use. 100 are available
Procedure TST_ShowTime(Index : Integer = 1;Detail : Boolean = True; DisplaySM : Boolean = False);
//In a ShowMessage displays time
//Uses DateTimeToStr if Detail is false else it breaks it down (H,M,S,MS)
//Index is the index of the timer to use. 100 are available
variables declared
var
Time : array[1..100] of TDateTime;
Implementation
Procedure TST_StartTiming(Index : Integer = 1);
begin
Time[Index] := Now;
end;
Procedure TST_StopTiming(Index : Integer = 1;Display : Boolean = True; DisplaySM : Boolean = False);
begin
Time[Index] := Now - Time[Index];
if Display then TST_ShowTime;
end;
Procedure TST_ShowTime(Index : Integer = 1;Detail : Boolean = True; DisplaySM : Boolean = False);
var
H,M,S,MS : Word;
begin
if Detail then
begin
DecodeTime(Time[Index],H,M,S,MS);
if DisplaySM then
ShowMessage('Hour = ' + FloatToStr(H) + #13#10 +
'Min = ' + FloatToStr(M) + #13#10 +
'Sec = ' + FloatToStr(S) + #13#10 +
'MS = ' + FloatToStr(MS) + #13#10)
else
OutputDebugString(PChar('Hour = ' + FloatToStr(H) + #13#10 +
'Min = ' + FloatToStr(M) + #13#10 +
'Sec = ' + FloatToStr(S) + #13#10 +
'MS = ' + FloatToStr(MS) + #13#10));
end
else
ShowMessage(TimeToStr(Time[Index]));
OutputDebugString(Pchar(TimeToStr(Time[Index])));
end;
clock_gettime()
is the high solution, which is precise to nano seconds, you can also use rtdsc
, which is precise to CPU cycle, and lastly you can simply use gettimeofday()
.
来源:https://stackoverflow.com/questions/6030586/calculating-the-speed-of-routines