Timer function to provide time in nano seconds using C++

前端 未结 16 2035
野趣味
野趣味 2020-11-22 06:02

I wish to calculate the time it took for an API to return a value. The time taken for such an action is in the space of nano seconds. As the API is a C++ class/function, I a

相关标签:
16条回答
  • 2020-11-22 06:07

    With that level of accuracy, it would be better to reason in CPU tick rather than in system call like clock(). And do not forget that if it takes more than one nanosecond to execute an instruction... having a nanosecond accuracy is pretty much impossible.

    Still, something like that is a start:

    Here's the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started. It will work on Pentium and above (386/486 not supported). This code is actually MS Visual C++ specific, but can be probably very easy ported to whatever else, as long as it supports inline assembly.

    inline __int64 GetCpuClocks()
    {
    
        // Counter
        struct { int32 low, high; } counter;
    
        // Use RDTSC instruction to get clocks count
        __asm push EAX
        __asm push EDX
        __asm __emit 0fh __asm __emit 031h // RDTSC
        __asm mov counter.low, EAX
        __asm mov counter.high, EDX
        __asm pop EDX
        __asm pop EAX
    
        // Return result
        return *(__int64 *)(&counter);
    
    }
    

    This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute.

    Using the Timing Figures:
    If you need to translate the clock counts into true elapsed time, divide the results by your chip's clock speed. Remember that the "rated" GHz is likely to be slightly different from the actual speed of your chip. To check your chip's true speed, you can use several very good utilities or the Win32 call, QueryPerformanceFrequency().

    0 讨论(0)
  • 2020-11-22 06:08

    Using Brock Adams's method, with a simple class:

    int get_cpu_ticks()
    {
        LARGE_INTEGER ticks;
        QueryPerformanceFrequency(&ticks);
        return ticks.LowPart;
    }
    
    __int64 get_cpu_clocks()
    {
        struct { int32 low, high; } counter;
    
        __asm cpuid
        __asm push EDX
        __asm rdtsc
        __asm mov counter.low, EAX
        __asm mov counter.high, EDX
        __asm pop EDX
        __asm pop EAX
    
        return *(__int64 *)(&counter);
    }
    
    class cbench
    {
    public:
        cbench(const char *desc_in) 
             : desc(strdup(desc_in)), start(get_cpu_clocks()) { }
        ~cbench()
        {
            printf("%s took: %.4f ms\n", desc, (float)(get_cpu_clocks()-start)/get_cpu_ticks());
            if(desc) free(desc);
        }
    private:
        char *desc;
        __int64 start;
    };
    

    Usage Example:

    int main()
    {
        {
            cbench c("test");
            ... code ...
        }
        return 0;
    }
    

    Result:

    test took: 0.0002 ms

    Has some function call overhead, but should be still more than fast enough :)

    0 讨论(0)
  • 2020-11-22 06:09

    Minimalistic copy&paste-struct + lazy usage

    If the idea is to have a minimalistic struct that you can use for quick tests, then I suggest you just copy and paste anywhere in your C++ file right after the #include's. This is the only instance in which I sacrifice Allman-style formatting.

    You can easily adjust the precision in the first line of the struct. Possible values are: nanoseconds, microseconds, milliseconds, seconds, minutes, or hours.

    #include <chrono>
    struct MeasureTime
    {
        using precision = std::chrono::microseconds;
        std::vector<std::chrono::steady_clock::time_point> times;
        std::chrono::steady_clock::time_point oneLast;
        void p() {
            std::cout << "Mark " 
                    << times.size()/2
                    << ": " 
                    << std::chrono::duration_cast<precision>(times.back() - oneLast).count() 
                    << std::endl;
        }
        void m() {
            oneLast = times.back();
            times.push_back(std::chrono::steady_clock::now());
        }
        void t() {
            m();
            p();
            m();
        }
        MeasureTime() {
            times.push_back(std::chrono::steady_clock::now());
        }
    };
    

    Usage

    MeasureTime m; // first time is already in memory
    doFnc1();
    m.t(); // Mark 1: next time, and print difference with previous mark
    doFnc2();
    m.t(); // Mark 2: next time, and print difference with previous mark
    doStuff = doMoreStuff();
    andDoItAgain = doStuff.aoeuaoeu();
    m.t(); // prints 'Mark 3: 123123' etc...
    

    Standard output result

    Mark 1: 123
    Mark 2: 32
    Mark 3: 433234
    

    If you want summary after execution

    If you want the report afterwards, because for example your code in between also writes to standard output. Then add the following function to the struct (just before MeasureTime()):

    void s() { // summary
        int i = 0;
        std::chrono::steady_clock::time_point tprev;
        for(auto tcur : times)
        {
            if(i > 0)
            {
                std::cout << "Mark " << i << ": "
                        << std::chrono::duration_cast<precision>(tprev - tcur).count()
                        << std::endl;
            }
            tprev = tcur;
            ++i;
        }
    }
    

    So then you can just use:

    MeasureTime m;
    doFnc1();
    m.m();
    doFnc2();
    m.m();
    doStuff = doMoreStuff();
    andDoItAgain = doStuff.aoeuaoeu();
    m.m();
    m.s();
    

    Which will list all the marks just like before, but then after the other code is executed. Note that you shouldn't use both m.s() and m.t().

    0 讨论(0)
  • 2020-11-22 06:12

    To do this correctly you can use one of two ways, either go with RDTSC or with clock_gettime(). The second is about 2 times faster and has the advantage of giving the right absolute time. Note that for RDTSC to work correctly you need to use it as indicated (other comments on this page have errors, and may yield incorrect timing values on certain processors)

    inline uint64_t rdtsc()
    {
        uint32_t lo, hi;
        __asm__ __volatile__ (
          "xorl %%eax, %%eax\n"
          "cpuid\n"
          "rdtsc\n"
          : "=a" (lo), "=d" (hi)
          :
          : "%ebx", "%ecx" );
        return (uint64_t)hi << 32 | lo;
    }
    

    and for clock_gettime: (I chose microsecond resolution arbitrarily)

    #include <time.h>
    #include <sys/timeb.h>
    // needs -lrt (real-time lib)
    // 1970-01-01 epoch UTC time, 1 mcs resolution (divide by 1M to get time_t)
    uint64_t ClockGetTime()
    {
        timespec ts;
        clock_gettime(CLOCK_REALTIME, &ts);
        return (uint64_t)ts.tv_sec * 1000000LL + (uint64_t)ts.tv_nsec / 1000LL;
    }
    

    the timing and values produced:

    Absolute values:
    rdtsc           = 4571567254267600
    clock_gettime   = 1278605535506855
    
    Processing time: (10000000 runs)
    rdtsc           = 2292547353
    clock_gettime   = 1031119636
    
    0 讨论(0)
  • 2020-11-22 06:12

    I'm using Borland code here is the code ti_hund gives me some times a negativnumber but timing is fairly good.

    #include <dos.h>
    
    void main() 
    {
    struct  time t;
    int Hour,Min,Sec,Hun;
    gettime(&t);
    Hour=t.ti_hour;
    Min=t.ti_min;
    Sec=t.ti_sec;
    Hun=t.ti_hund;
    printf("Start time is: %2d:%02d:%02d.%02d\n",
       t.ti_hour, t.ti_min, t.ti_sec, t.ti_hund);
    ....
    your code to time
    ...
    
    // read the time here remove Hours and min if the time is in sec
    
    gettime(&t);
    printf("\nTid Hour:%d Min:%d Sec:%d  Hundreds:%d\n",t.ti_hour-Hour,
                                 t.ti_min-Min,t.ti_sec-Sec,t.ti_hund-Hun);
    printf("\n\nAlt Ferdig Press a Key\n\n");
    getch();
    } // end main
    
    0 讨论(0)
  • 2020-11-22 06:14

    What others have posted about running the function repeatedly in a loop is correct.

    For Linux (and BSD) you want to use clock_gettime().

    #include <sys/time.h>
    
    int main()
    {
       timespec ts;
       // clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
       clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
    }
    

    For windows you want to use the QueryPerformanceCounter. And here is more on QPC

    Apparently there is a known issue with QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:

    QueryPerformanceCounter() and QueryPerformanceFrequency() offer a bit better resolution, but have different issues. For example in Windows XP, all AMD Athlon X2 dual core CPUs return the PC of either of the cores "randomly" (the PC sometimes jumps a bit backwards), unless you specially install AMD dual core driver package to fix the issue. We haven't noticed any other dual+ core CPUs having similar issues (p4 dual, p4 ht, core2 dual, core2 quad, phenom quad).

    EDIT 2013/07/16:

    It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http://msdn.microsoft.com/en-us/library/windows/desktop/ee417693(v=vs.85).aspx

    ...While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for multiple processors, bugs in the BIOS or drivers may result in these routines returning different values as the thread moves from one processor to another...

    However this StackOverflow answer https://stackoverflow.com/a/4588605/34329 states that QPC should work fine on any MS OS after Win XP service pack 2.

    This article shows that Windows 7 can determine if the processor(s) have an invariant TSC and falls back to an external timer if they don't. http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html Synchronizing across processors is still an issue.

    Other fine reading related to timers:

    • https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks
    • http://lwn.net/Articles/209101/
    • http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html
    • QueryPerformanceCounter Status?

    See the comments for more details.

    0 讨论(0)
提交回复
热议问题