Timer function to provide time in nano seconds using C++

前端未结

关注

 16  2035

I wish to calculate the time it took for an API to return a value. The time taken for such an action is in the space of nano seconds. As the API is a C++ class/function, I a

相关标签:

16条回答

深忆病人

2020-11-22 06:07
With that level of accuracy, it would be better to reason in CPU tick rather than in system call like clock(). And do not forget that if it takes more than one nanosecond to execute an instruction... having a nanosecond accuracy is pretty much impossible.

Still, something like that is a start:

Here's the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started. It will work on Pentium and above (386/486 not supported). This code is actually MS Visual C++ specific, but can be probably very easy ported to whatever else, as long as it supports inline assembly.
```
inline __int64 GetCpuClocks()
{

    // Counter
    struct { int32 low, high; } counter;

    // Use RDTSC instruction to get clocks count
    __asm push EAX
    __asm push EDX
    __asm __emit 0fh __asm __emit 031h // RDTSC
    __asm mov counter.low, EAX
    __asm mov counter.high, EDX
    __asm pop EDX
    __asm pop EAX

    // Return result
    return *(__int64 *)(&counter);

}
```
This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute.

Using the Timing Figures:
If you need to translate the clock counts into true elapsed time, divide the results by your chip's clock speed. Remember that the "rated" GHz is likely to be slightly different from the actual speed of your chip. To check your chip's true speed, you can use several very good utilities or the Win32 call, QueryPerformanceFrequency().
0 讨论(0)
发布评论:

提交评论
- 加载中...

暗喜

2020-11-22 06:08

Using Brock Adams's method, with a simple class:

int get_cpu_ticks()
{
    LARGE_INTEGER ticks;
    QueryPerformanceFrequency(&ticks);
    return ticks.LowPart;
}

__int64 get_cpu_clocks()
{
    struct { int32 low, high; } counter;

    __asm cpuid
    __asm push EDX
    __asm rdtsc
    __asm mov counter.low, EAX
    __asm mov counter.high, EDX
    __asm pop EDX
    __asm pop EAX

    return *(__int64 *)(&counter);
}

class cbench
{
public:
    cbench(const char *desc_in) 
         : desc(strdup(desc_in)), start(get_cpu_clocks()) { }
    ~cbench()
    {
        printf("%s took: %.4f ms\n", desc, (float)(get_cpu_clocks()-start)/get_cpu_ticks());
        if(desc) free(desc);
    }
private:
    char *desc;
    __int64 start;
};

Usage Example:

int main()
{
    {
        cbench c("test");
        ... code ...
    }
    return 0;
}

Result:

test took: 0.0002 ms

Has some function call overhead, but should be still more than fast enough :)

0 讨论(0)

野的像风

2020-11-22 06:09

Minimalistic copy&paste-struct + lazy usage

If the idea is to have a minimalistic struct that you can use for quick tests, then I suggest you just copy and paste anywhere in your C++ file right after the #include's. This is the only instance in which I sacrifice Allman-style formatting.

You can easily adjust the precision in the first line of the struct. Possible values are: nanoseconds, microseconds, milliseconds, seconds, minutes, or hours.

#include <chrono>
struct MeasureTime
{
    using precision = std::chrono::microseconds;
    std::vector<std::chrono::steady_clock::time_point> times;
    std::chrono::steady_clock::time_point oneLast;
    void p() {
        std::cout << "Mark " 
                << times.size()/2
                << ": " 
                << std::chrono::duration_cast<precision>(times.back() - oneLast).count() 
                << std::endl;
    }
    void m() {
        oneLast = times.back();
        times.push_back(std::chrono::steady_clock::now());
    }
    void t() {
        m();
        p();
        m();
    }
    MeasureTime() {
        times.push_back(std::chrono::steady_clock::now());
    }
};

Usage

MeasureTime m; // first time is already in memory
doFnc1();
m.t(); // Mark 1: next time, and print difference with previous mark
doFnc2();
m.t(); // Mark 2: next time, and print difference with previous mark
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.t(); // prints 'Mark 3: 123123' etc...

Standard output result

Mark 1: 123
Mark 2: 32
Mark 3: 433234

If you want summary after execution

If you want the report afterwards, because for example your code in between also writes to standard output. Then add the following function to the struct (just before MeasureTime()):

void s() { // summary
    int i = 0;
    std::chrono::steady_clock::time_point tprev;
    for(auto tcur : times)
    {
        if(i > 0)
        {
            std::cout << "Mark " << i << ": "
                    << std::chrono::duration_cast<precision>(tprev - tcur).count()
                    << std::endl;
        }
        tprev = tcur;
        ++i;
    }
}

So then you can just use:

MeasureTime m;
doFnc1();
m.m();
doFnc2();
m.m();
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.m();
m.s();

Which will list all the marks just like before, but then after the other code is executed. Note that you shouldn't use both m.s() and m.t().

0 讨论(0)

半阙折子戏

2020-11-22 06:12

To do this correctly you can use one of two ways, either go with RDTSC or with clock_gettime(). The second is about 2 times faster and has the advantage of giving the right absolute time. Note that for RDTSC to work correctly you need to use it as indicated (other comments on this page have errors, and may yield incorrect timing values on certain processors)

inline uint64_t rdtsc()
{
    uint32_t lo, hi;
    __asm__ __volatile__ (
      "xorl %%eax, %%eax\n"
      "cpuid\n"
      "rdtsc\n"
      : "=a" (lo), "=d" (hi)
      :
      : "%ebx", "%ecx" );
    return (uint64_t)hi << 32 | lo;
}

and for clock_gettime: (I chose microsecond resolution arbitrarily)

#include <time.h>
#include <sys/timeb.h>
// needs -lrt (real-time lib)
// 1970-01-01 epoch UTC time, 1 mcs resolution (divide by 1M to get time_t)
uint64_t ClockGetTime()
{
    timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);
    return (uint64_t)ts.tv_sec * 1000000LL + (uint64_t)ts.tv_nsec / 1000LL;
}

the timing and values produced:

Absolute values:
rdtsc           = 4571567254267600
clock_gettime   = 1278605535506855

Processing time: (10000000 runs)
rdtsc           = 2292547353
clock_gettime   = 1031119636

0 讨论(0)

我寻月下人不归

2020-11-22 06:12

I'm using Borland code here is the code ti_hund gives me some times a negativnumber but timing is fairly good.

#include <dos.h>

void main() 
{
struct  time t;
int Hour,Min,Sec,Hun;
gettime(&t);
Hour=t.ti_hour;
Min=t.ti_min;
Sec=t.ti_sec;
Hun=t.ti_hund;
printf("Start time is: %2d:%02d:%02d.%02d\n",
   t.ti_hour, t.ti_min, t.ti_sec, t.ti_hund);
....
your code to time
...

// read the time here remove Hours and min if the time is in sec

gettime(&t);
printf("\nTid Hour:%d Min:%d Sec:%d  Hundreds:%d\n",t.ti_hour-Hour,
                             t.ti_min-Min,t.ti_sec-Sec,t.ti_hund-Hun);
printf("\n\nAlt Ferdig Press a Key\n\n");
getch();
} // end main

0 讨论(0)

天命终不由人

2020-11-22 06:14
What others have posted about running the function repeatedly in a loop is correct.

For Linux (and BSD) you want to use clock_gettime().
```
#include <sys/time.h>

int main()
{
   timespec ts;
   // clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
   clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
}
```
For windows you want to use the QueryPerformanceCounter. And here is more on QPC

Apparently there is a known issue with QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:

QueryPerformanceCounter() and QueryPerformanceFrequency() offer a bit better resolution, but have different issues. For example in Windows XP, all AMD Athlon X2 dual core CPUs return the PC of either of the cores "randomly" (the PC sometimes jumps a bit backwards), unless you specially install AMD dual core driver package to fix the issue. We haven't noticed any other dual+ core CPUs having similar issues (p4 dual, p4 ht, core2 dual, core2 quad, phenom quad).

EDIT 2013/07/16:

It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http://msdn.microsoft.com/en-us/library/windows/desktop/ee417693(v=vs.85).aspx

...While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for multiple processors, bugs in the BIOS or drivers may result in these routines returning different values as the thread moves from one processor to another...

However this StackOverflow answer https://stackoverflow.com/a/4588605/34329 states that QPC should work fine on any MS OS after Win XP service pack 2.

This article shows that Windows 7 can determine if the processor(s) have an invariant TSC and falls back to an external timer if they don't. http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html Synchronizing across processors is still an issue.

Other fine reading related to timers:
- https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks
- http://lwn.net/Articles/209101/
- http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html
- QueryPerformanceCounter Status?
See the comments for more details.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页