Can clock() be used as a dependable API to measure time taken by CPU to execute a snippet of code? When verified usng times() / clock(), both do not seem to measure the CPU
Resource usage of a process/thread is updated by the OS only periodically. It's entirely possible for a code snippet to complete before the next update thus producing zero resource usage diffs. Can't say anything about HP or AIX, would refer you to Solaris Performance and Tools book for Sun. For Linux you want to look at oprofile and newer perf tool. On the profiling side valgrind would be of much help.
I would give a try with getrusage
and check system and user time.
Also check with gettimeofday
to compare with wall clock time.
I would try to correlate the time with the shell's time
command, as a sanity check.
You should also consider that the compiler may be optimizing the loop. Since the memset does not depend on the loop variable the compiler will certainly be tempted to apply an optimization known as loop invariant code motion.
I would also caution that a 10MB possibly in-cache clear will really be 1.25 or 2.5 million CPU operations as memset certainly writes in 4-byte or 8-byte quantities. While I rather doubt that this could be done in less than a microsecond, as stores are a bit expensive and 100K adds some L1 cache pressure, you are talking about not much more than one operation per nanosecond, which is not that hard to sustain for a multi-GHz CPU.
One imagines that 600 nS would round off to 1 clock tick, but I would worry about that as well.
you can use clock_t
to get the number of CPU ticks since the program started.
Or you can use the linux time
command. eg: time [program] [arguments]
On recent Linux's (*). you can get this information from the /proc filesystem. In the file /proc/PID/stat
the 14th entry has the number of jiffies used in userland code and the 15th entry has the number of jiffies used in system code.
If you want to see the data on a per-thread basis, you should reference the file /proc/PID/task/TID/stat
instead.
To convert jiffies to microseconds, you can use the following:
define USEC_PER_SEC 1000000UL
long long jiffies_to_microsecond(long long jiffies)
{
long hz = sysconf(_SC_CLK_TCK);
if (hz <= USEC_PER_SEC && !(USEC_PER_SEC % hz))
{
return (USEC_PER_SEC / hz) * jiffies;
}
else if (hz > USEC_PER_SEC && !(hz % USEC_PER_SEC))
{
return (jiffies + (hz / USEC_PER_SEC) - 1) / (hz / USEC_PER_SEC);
}
else
{
return (jiffies * USEC_PER_SEC) / hz;
}
}
If all you care about is the per-process statistics, getrusage
is easier. But if you want to be prepared to do this on a per-thread basis, this technique is better as other then the file name, the code would be identical for getting the data per-process or per-thread.
* - I'm not sure exactly when the stat file was introduced. You will need to verify your system has it.
Some info here on HP's page about high resolution timers. Also, same trick _Asm_mov_from_ar (_AREG_ITC);
used in http://www.fftw.org/cycle.h too.
Have to confirm if this can really be the solution.
Sample prog, as tested on HP-UX 11.31:
bbb@m_001/tmp/prof > ./perf_ticks 1024
ticks-memset {func [1401.000000] inline [30.000000]} noop [9.000000]
bbb@m_001/tmp/prof > cat perf_ticks.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include "cycle.h" /* one from http://www.fftw.org/cycle.h */
void test_ticks(char* sbuf, int* len){
memset((char*)sbuf,0,*len);
}
int main(int argc,char* argv[]){
int len=atoi(argv[1]);
char *sbuf=(char*)malloc(len);
ticks t1,t2,t3,t4,t5,t6;
t1 =getticks(); test_ticks(sbuf,&len); t2 =getticks();
t3 =getticks(); memset((char*)sbuf,0,len); t4 =getticks();
t5=getticks();;t6=getticks();
printf("ticks-memset {func [%llf] inline [%llf]} noop [%llf]\n",
elapsed(t2,t1),elapsed(t4,t3),elapsed(t6,t5));
free(sbuf); return 0;
}
bbb@m_001/tmp/prof >