Gprof: specific function time [duplicate]

让人想犯罪 __ 提交于 2019-12-01 12:09:13
Basile Starynkevitch

You are nearly repeating another question about function execution time.

As I answered there, there is a difficulty (due to hardware!) to get reliably the execution time of some particular function, specially if that function takes little time (e.g. less than a millisecond). Your original question pointed to these methods.

I would suggest using clock_gettime(2) with CLOCK_REALTIME or perhaps CLOCK_THREAD_CPUTIME_ID

gprof(1) (after compilation with -pg) works with profil(3) and is using a sampling technique, based upon sending a SIGPROF signal (see signal(7)) at periodic intervals (e.g. each 10 milliseconds) from a timer set with setitimer(2) and TIMER_PROF; so the program counter is sampled periodically. Read the wikipage on gprof and notice that profiling may significantly degrade the running time.

If your function gets executed in a short time (less than a millisecond) the profiling gives an imprecise measurement (read about heisenbugs).

In other words, profiling and measuring the time of a short running function is altering the behavior of the program (and this would happen with some other OS too!). You might have to give up the goal of measuring precisely and reliably and accurately the timing of your function without disturbing it. It might even not make any precise sense, e.g. because of the CPU cache.

You could use gprof without any -F argument and, if needed, post-process the textual profile output (e.g. with GNU awk) to extract the information you want.

BTW, the precise timing of a particular function might not be important. What is important is the benchmarking of the entire application.

You could also ask the compiler to optimize even more your program; if you are using link time optimization, i.e. compiling and linking with g++ -flto -O2, the notion of timing of a small function may even cease to exist (because the compiler and the linker could have inlined it without you knowing that).

Consider also that current superscalar processors have a so complex micro-architecture with instruction pipeline, caches, branch predictor, register renaming, speculative execution, out-of-order execution etc etc that the very notion of timing a short function is undefined. You cannot predict or measure it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!