Is clock_gettime() adequate for submicrosecond timing?

后端 未结 6 1461
误落风尘
误落风尘 2020-12-28 16:14

I need a high-resolution timer for the embedded profiler in the Linux build of our application. Our profiler measures scopes as small as individual functions, so it needs a

相关标签:
6条回答
  • 2020-12-28 16:42

    Give clockid_t CLOCK_MONOTONIC_RAW a try?

    CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific) Similar to CLOCK_MONOTONIC, but provides access to a raw hardware-based time that is not subject to NTP adjustments or the incremental adjustments performed by adjtime(3).

    From Man7.org

    0 讨论(0)
  • 2020-12-28 16:42

    It's hard to give a globally applicable answer because the hardware and software implementation will vary widely.

    However, yes, most modern platforms will have a suitable clock_gettime call that is implemented purely in user-space using the VDSO mechanism, and will in my experience take 20 to 30 nanoseconds to complete (but see Wojciech's comment below about contention).

    Internally, this is using rdtsc or rdtscp for the fine-grained portion of the time-keeping, plus adjustments to keep this in sync with wall-clock time (depending on the clock you choose) and a multiplication to convert from whatever units rdtsc has on your platform to nanoseconds.

    Not all of the clocks offered by clock_gettime will implement this fast method, and it's not always obvious which ones do. Usually CLOCK_MONOTONIC is a good option, but you should test this on your own system.

    0 讨论(0)
  • 2020-12-28 16:44

    I ran some benchmarks on my system which is a quad core E5645 Xeon supporting a constant TSC running kernel 3.2.54 and the results were:

    clock_gettime(CLOCK_MONOTONIC_RAW)       100ns/call
    clock_gettime(CLOCK_MONOTONIC)           25ns/call
    clock_gettime(CLOCK_REALTIME)            25ns/call
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID)  400ns/call
    rdtsc (implementation @DavidSchwarz)     600ns/call
    

    So it looks like on a reasonably modern system the (accepted answer) rdtsc is the worst route to go down.

    0 讨论(0)
  • 2020-12-28 16:45

    No. You'll have to use platform-specific code to do it. On x86 and x86-64, you can use 'rdtsc' to read the Time Stamp Counter.

    Just port the rdtsc assembly you're using.

    __inline__ uint64_t rdtsc(void) {
      uint32_t lo, hi;
      __asm__ __volatile__ (      // serialize
      "xorl %%eax,%%eax \n        cpuid"
      ::: "%rax", "%rbx", "%rcx", "%rdx");
      /* We cannot use "=A", since this would use %rax on x86_64 and return only the lower 32bits of the TSC */
      __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      return (uint64_t)hi << 32 | lo;
    }
    
    0 讨论(0)
  • 2020-12-28 16:56

    I need a high-resolution timer for the embedded profiler in the Linux build of our application. Our profiler measures scopes as small as individual functions, so it needs a timer precision of better than 25 nanoseconds.

    Have you considered oprofile or perf? You can use the performance counter hardware on your CPU to get profiling data without adding instrumentation to the code itself. You can see data per-function, or even per-line-of-code. The "only" drawback is that it won't measure wall clock time consumed, it will measure CPU time consumed, so it's not appropriate for all investigations.

    0 讨论(0)
  • 2020-12-28 16:57

    You are calling clock_getttime with control parameter which means the api is branching through if-else tree to see what kind of time you want. I know you cant't avoid that with this call, but see if you can dig into the system code and call what the kernal is eventually calling directly. Also, I note that you are including the loop time (i++, and conditional branch).

    0 讨论(0)
提交回复
热议问题