rdtsc accuracy across CPU cores

后端 未结 6 712
一生所求
一生所求 2020-11-28 22:53

I am sending network packets from one thread and receiving replies on a 2nd thread that runs on a different CPU core. My process measures the time between send & receiv

相关标签:
6条回答
  • 2020-11-28 23:20

    You can set thread affinity using sched_set_affinity() API in order to run your thread on one CPU core.

    0 讨论(0)
  • 2020-11-28 23:22

    On linux you can use clock_gettime(3) with CLOCK_MONOTONIC_RAW, which gives you nanoseconds resulotion and in not subject to ntp updates (if any happened).

    0 讨论(0)
  • 2020-11-28 23:30

    On recent processors you can do it between separate cores of the same package (i.e. a system with just one core iX processor), you just can't do it in separate packages (processors), because they won't share the rtc. You can get away with it via cpu affinity (locking relevant threads to specific cores), but then again it would depend on the way your application behaves.

    On linux you can check constant_tsc on /proc/cpuinfo in order to see if the processor has a single tsc valid for the entire package. The raw register is in CPUID.80000007H:EDX[8]

    What I read around, but have not yet confirmed programatically, is that AMD cpus from revision 11h onwards have the same meaning for this cpuid bit.

    0 讨论(0)
  • 2020-11-28 23:32

    I recommend that you don't use rdtsc. Not only is it not portable, it's not reliable and generally won't work - on some systems the rdtsc does not update uniformly (like if you're using speedstep etc). If you want accurate timing information you should set the SO_TIMESTAMP option on the socket and use recvmsg() to get the message with a (microsecond resolution) timestamp.

    Moreover, the timestamp you get with SO_TIMESTAMP actually IS the time the kernel got the packet, not when your task happened to notice.

    0 讨论(0)
  • 2020-11-28 23:43

    X86_FEATURE_CONSTANT_TSC + X86_FEATURE_NONSTOP_TSC bits in cpuid (edx=x80000007, bit #8; check unsynchronized_tsc function of linux kernel for more checks)

    Intel's Designer's vol3b, section 16.11.1 Invariant TSC it says the following

    "16.11.1 Invariant TSC

    The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by CPUID.80000007H:EDX[8].

    The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource."

    So, if TSC can be used for wallclock, they are guaranteed to be in sync.

    0 讨论(0)
  • 2020-11-28 23:43

    In fact, it seems that cores doesn´t share TSC, check this thread: http://software.intel.com/en-us/forums/topic/388964

    Summarizing, different cores does not share TSC, sometimes TSC can get out of synchronization if a core change to an specific energy state, but it depends on the kind of CPU, so you need to check the Intel documentation. It seems that most Operating Systems synchronize TSC on boot.
    I checked the differences between TSC on different cores, using an exciting-reacting algorithm, on a Linux Debian machine with core i5 processor. The exciter process (in one core) writed the TSC in a shared variable, when the reacting process detected a change in that variable it compares its value and compares it with its own TSC. This is an example output of my test program:

    TSC ping-pong test result:
    TSC cores (exciter-reactor): 0-1
    100 records, avrg: 159, range: 105-269
    Dispersion: 13
    TSC ping-pong test result:
    TSC cores (exciter-reactor): 1-0
    100 records, avrg: 167, range: 125-410
    Dispersion: 13
    

    The reaction time when the exciter CPU is 0 (159 tics on average) is almost the same than when the exciter CPU is 1 (167 tics). This indicates that they are pretty well synchronized (perhaps with a few tics of difference). On other core pairs, results were very similar.
    On the other hand, rdtscp assembly instruction return a value indicating the CPU in which the TSC was read. It is not your case but it can be useful when you want to measure time in a simple code segment and you want to ensure that the process was not moved of CPU in the middle of the code.

    0 讨论(0)
提交回复
热议问题