I wish to calculate the time it took for an API to return a value. The time taken for such an action is in the space of nano seconds. As the API is a C++ class/function, I a
With that level of accuracy, it would be better to reason in CPU tick rather than in system call like clock(). And do not forget that if it takes more than one nanosecond to execute an instruction... having a nanosecond accuracy is pretty much impossible.
Still, something like that is a start:
Here's the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started. It will work on Pentium and above (386/486 not supported). This code is actually MS Visual C++ specific, but can be probably very easy ported to whatever else, as long as it supports inline assembly.
inline __int64 GetCpuClocks()
{
// Counter
struct { int32 low, high; } counter;
// Use RDTSC instruction to get clocks count
__asm push EAX
__asm push EDX
__asm __emit 0fh __asm __emit 031h // RDTSC
__asm mov counter.low, EAX
__asm mov counter.high, EDX
__asm pop EDX
__asm pop EAX
// Return result
return *(__int64 *)(&counter);
}
This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute.
Using the Timing Figures:
If you need to translate the clock counts into true elapsed time, divide the results by your chip's clock speed. Remember that the "rated" GHz is likely to be slightly different from the actual speed of your chip. To check your chip's true speed, you can use several very good utilities or the Win32 call, QueryPerformanceFrequency().
Using Brock Adams's method, with a simple class:
int get_cpu_ticks()
{
LARGE_INTEGER ticks;
QueryPerformanceFrequency(&ticks);
return ticks.LowPart;
}
__int64 get_cpu_clocks()
{
struct { int32 low, high; } counter;
__asm cpuid
__asm push EDX
__asm rdtsc
__asm mov counter.low, EAX
__asm mov counter.high, EDX
__asm pop EDX
__asm pop EAX
return *(__int64 *)(&counter);
}
class cbench
{
public:
cbench(const char *desc_in)
: desc(strdup(desc_in)), start(get_cpu_clocks()) { }
~cbench()
{
printf("%s took: %.4f ms\n", desc, (float)(get_cpu_clocks()-start)/get_cpu_ticks());
if(desc) free(desc);
}
private:
char *desc;
__int64 start;
};
Usage Example:
int main()
{
{
cbench c("test");
... code ...
}
return 0;
}
Result:
test took: 0.0002 ms
Has some function call overhead, but should be still more than fast enough :)
If the idea is to have a minimalistic struct that you can use for quick tests, then I suggest you just copy and paste anywhere in your C++ file right after the #include
's. This is the only instance in which I sacrifice Allman-style formatting.
You can easily adjust the precision in the first line of the struct. Possible values are: nanoseconds
, microseconds
, milliseconds
, seconds
, minutes
, or hours
.
#include <chrono>
struct MeasureTime
{
using precision = std::chrono::microseconds;
std::vector<std::chrono::steady_clock::time_point> times;
std::chrono::steady_clock::time_point oneLast;
void p() {
std::cout << "Mark "
<< times.size()/2
<< ": "
<< std::chrono::duration_cast<precision>(times.back() - oneLast).count()
<< std::endl;
}
void m() {
oneLast = times.back();
times.push_back(std::chrono::steady_clock::now());
}
void t() {
m();
p();
m();
}
MeasureTime() {
times.push_back(std::chrono::steady_clock::now());
}
};
MeasureTime m; // first time is already in memory
doFnc1();
m.t(); // Mark 1: next time, and print difference with previous mark
doFnc2();
m.t(); // Mark 2: next time, and print difference with previous mark
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.t(); // prints 'Mark 3: 123123' etc...
Mark 1: 123
Mark 2: 32
Mark 3: 433234
If you want the report afterwards, because for example your code in between also writes to standard output. Then add the following function to the struct (just before MeasureTime()):
void s() { // summary
int i = 0;
std::chrono::steady_clock::time_point tprev;
for(auto tcur : times)
{
if(i > 0)
{
std::cout << "Mark " << i << ": "
<< std::chrono::duration_cast<precision>(tprev - tcur).count()
<< std::endl;
}
tprev = tcur;
++i;
}
}
So then you can just use:
MeasureTime m;
doFnc1();
m.m();
doFnc2();
m.m();
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.m();
m.s();
Which will list all the marks just like before, but then after the other code is executed. Note that you shouldn't use both m.s()
and m.t()
.
To do this correctly you can use one of two ways, either go with RDTSC
or with clock_gettime()
.
The second is about 2 times faster and has the advantage of giving the right absolute time. Note that for RDTSC
to work correctly you need to use it as indicated (other comments on this page have errors, and may yield incorrect timing values on certain processors)
inline uint64_t rdtsc()
{
uint32_t lo, hi;
__asm__ __volatile__ (
"xorl %%eax, %%eax\n"
"cpuid\n"
"rdtsc\n"
: "=a" (lo), "=d" (hi)
:
: "%ebx", "%ecx" );
return (uint64_t)hi << 32 | lo;
}
and for clock_gettime: (I chose microsecond resolution arbitrarily)
#include <time.h>
#include <sys/timeb.h>
// needs -lrt (real-time lib)
// 1970-01-01 epoch UTC time, 1 mcs resolution (divide by 1M to get time_t)
uint64_t ClockGetTime()
{
timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
return (uint64_t)ts.tv_sec * 1000000LL + (uint64_t)ts.tv_nsec / 1000LL;
}
the timing and values produced:
Absolute values:
rdtsc = 4571567254267600
clock_gettime = 1278605535506855
Processing time: (10000000 runs)
rdtsc = 2292547353
clock_gettime = 1031119636
I'm using Borland code here is the code ti_hund gives me some times a negativnumber but timing is fairly good.
#include <dos.h>
void main()
{
struct time t;
int Hour,Min,Sec,Hun;
gettime(&t);
Hour=t.ti_hour;
Min=t.ti_min;
Sec=t.ti_sec;
Hun=t.ti_hund;
printf("Start time is: %2d:%02d:%02d.%02d\n",
t.ti_hour, t.ti_min, t.ti_sec, t.ti_hund);
....
your code to time
...
// read the time here remove Hours and min if the time is in sec
gettime(&t);
printf("\nTid Hour:%d Min:%d Sec:%d Hundreds:%d\n",t.ti_hour-Hour,
t.ti_min-Min,t.ti_sec-Sec,t.ti_hund-Hun);
printf("\n\nAlt Ferdig Press a Key\n\n");
getch();
} // end main
What others have posted about running the function repeatedly in a loop is correct.
For Linux (and BSD) you want to use clock_gettime().
#include <sys/time.h>
int main()
{
timespec ts;
// clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
}
For windows you want to use the QueryPerformanceCounter. And here is more on QPC
Apparently there is a known issue with QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:
QueryPerformanceCounter() and QueryPerformanceFrequency() offer a bit better resolution, but have different issues. For example in Windows XP, all AMD Athlon X2 dual core CPUs return the PC of either of the cores "randomly" (the PC sometimes jumps a bit backwards), unless you specially install AMD dual core driver package to fix the issue. We haven't noticed any other dual+ core CPUs having similar issues (p4 dual, p4 ht, core2 dual, core2 quad, phenom quad).
EDIT 2013/07/16:
It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http://msdn.microsoft.com/en-us/library/windows/desktop/ee417693(v=vs.85).aspx
...While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for multiple processors, bugs in the BIOS or drivers may result in these routines returning different values as the thread moves from one processor to another...
However this StackOverflow answer https://stackoverflow.com/a/4588605/34329 states that QPC should work fine on any MS OS after Win XP service pack 2.
This article shows that Windows 7 can determine if the processor(s) have an invariant TSC and falls back to an external timer if they don't. http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html Synchronizing across processors is still an issue.
Other fine reading related to timers:
See the comments for more details.