Profiling some C++ number crunching code with both gprof
and kcachegrind
gives similar results for the functions that contribute most to the execution
gprof
's timing data is statistical (read about it in details of profiling docs).
On the other hand, KCacheGrind
uses valgrind
which actually interprets all the code.
So KCacheGrind
can be "more accurate" (at the expense of more overhead) if the CPU modeled by valgrind
is close to your real CPU.
Which one to choose also depends on what type of overhead you can handle. In my experience, gprof
adds less runtime overhead (execution time that is), but it is more intrusive (i.e. -pg
adds code to each and every one of your functions). So depending on the situation, on or the other is more appropriate.
For "better" gprof
data, run your code longer (and on as wide a range of test data you can). The more you have, the better the measurements will be statistically.