I am trying to measure the time taken by some code inside Linux kernel at very high accuracy by a Linux kernel module.
For this purpose, I have tried rdtscl
If you get NO clock ticks, then there's something seriously wrong with your code. Did you write your own rdtscl
[or copy it from somewhere that isn't a good source?]
By the way, modern Intel (and AMD) processors may well have "constant TSC", so a processor that is halted, sleeping, running slower, etc, will still tick away at the same rate as the others - it may not be in sync still, but that's a different matter.
Try running just a loop that prints the value from your counter - just the RDTSC instruction itself should take some 30-50 clock cycles, so you should see it moving.
Edit: Here's my rdtsc function:
void rdtscl(unsigned long long *ll)
{
unsigned int lo, hi;
__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
*ll = ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
alernatitvely, as a function returning a value:
unsigned long long rdtscl(void)
{
unsigned int lo, hi;
__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
I notice that your code doesn't pass a pointer of your unsigned long, which makes me suspect that you are not actually passing the timestamp counter BACK to the caller, but rather just keeping whatever value it happens to have - which may well be the same for both values.
The same WikiPedia article did said about issues with the TSC as below,
With the advent of multi-core/hyper-threaded CPUs, systems with multiple CPUs, and
hibernating operating systems, the TSC cannot be relied on to provide accurate results
— unless great care is taken to correct the possible flaws: rate of tick and whether
all cores (processors) have identical values in their time-keeping registers. **There
is no promise that the timestamp counters of multiple CPUs on a single motherboard will
be synchronized**. In such cases, programmers can only get reliable results by locking
their code to a single CPU. Even then, the CPU speed may change due to power-saving
measures taken by the OS or BIOS, or the system may be hibernated and later resumed
(resetting the time stamp counter). In those latter cases, to stay relevant, the
counter must be recalibrated periodically (according to the time resolution your
application requires).
Meaning modern CPU's can alter their CPU clock rate to save power which can affect the TSC value. Also TSC would never increment in situations like, when kernel may execute HALT and stop processor until an external interrupt is received.
the second question is that i have intel xeon i3 processor which has 4 processors &
each having 2 cores then measuring the clock ticks will give the ticks of single
processor or addition of all 4 processors..?
This may lead to a situation where a process could read a time on one processor, move to a second processor and encounter a time earlier than the one it read on the first processor which results in TSC as unstable time source.
All the cores have their own TSC; it basically counts cycles- but beware - the TSC clocks may not be synchronized! if your code starts running on one core and migrates to the 2nd one, which is certainly possible in the general case, your count will be wrong!
Some of things mentioned here are accurate like TSC not being a measure of time because of S states in the CPU. But I think TSC can be used for relative sequencing even in a multi-core environment. There is a flag called TSCInvariant which is set to true in Intel CPUs >= nehalem arch. In those CPUs the TSC varies at a constant rate on all cores. Therefore you will never go back in TSC count if you get context switched to a different core.
In Ubuntu you can do sudo apt-get install cpuid
cpuid | grep TscInvariant to verify it in your desktop.