Confusing gprof output

佐手、 提交于 2019-11-30 07:52:54
Jesse Beder

The bottleneck turned out to be in file I/O (see Is std::ifstream significantly slower than FILE?). I switched to reading the entire file in a buffer and it sped up enormously.

The problem here was that gprof doesn't appear to generate accurate profiling when waiting for file I/O (see http://www.regatta.cs.msu.su/doc/usr/share/man/info/ru_RU/a_doc_lib/cmds/aixcmds2/gprof.htm). In fact, seekg and tellg were not even on the profiling list, and they were the bottleneck!

Self seconds is the time spent in [whatever].

Cumulative seconds is the time spent in [whatever] and the calls above it (such as [whatever] + main)

Neither of those include time spent in functions called from [whatever]. That's why you're not seeing more time listed.

If your [whatever] function is calling lots of printf's, for example, your gprof output is telling you that printf is eating the majority of that time.

This seems to be a pretty good overview of how to read gprof output. The 31.07% you are looking at is portion of the total run time gprof thinks is being spent in just that function (not including functions it calls). Odds are the reason the percentage is so large while the time is small is that gprof thinks the program doesn't take as long as you do. This is easily checked by scrolling down this first section of the gprof output: cumulative seconds will keep getting larger and larger until it limits out at the total running time of the program (from gprof's perspective). I think you will find this is about one second rather than the 16 you are expecting.

As for why there is such a large discrepancy there, I can't say. Perhaps gprof isn't seeing all of the code. Or did you use time on the instrumented code while it was profiling? I wouldn't expect that to work correctly...

casualcoder

Have you tried some of the other tools mentioned in this question? It would be interesting how they compare.

Mike Dunlavey

You're experiencing a problem common to gprof and other profilers based on the same concepts - 1) sample the program counter to get some kind of histogram, 2) instrument the functions to measure times, counts, and get a call graph.

For actually locating performance problems, they are missing the point.
It's not about measuring routines, it's about finding guilty code.

Suppose you have a sampler that stroboscopically X-rays the program at random wall-clock times. In each sample, the program may be in the middle of I/O, it may be in code that you compiled, it may be in some library routine like malloc.

But no matter where it is, the responsibility for it spending that slice of time is jointly shared by every line of code on the call stack, because if any one of those calls had not been made, it would not be in the process of carrying out the work requested by that call.

So look at every line of code that shows up on multiple samples of the call stack (the more samples it is on, the better). That's where the money is. Don't just look where the program counter is. There are "deep pockets" higher up the stack.

Yes those "seconds" values are aren't per call. The percentage time is for the entire run of the program. In effect your program spent 31% of it's time in that function (due to number of calls + time taken per call).

You might want to read up on how to analyze gprof's flat profiles.

Correction: Sorry, those first two seconds values are cumulative as pointed out by the OP.

I think that it's odd that you're seeing 0 for "self" and "total s/call".

Quoting the section on gprof accuracy: "The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods. If the sampling period is 0.01 seconds and foo's run-time is 1 second, the expected error in foo's run-time is 0.1 seconds. It is likely to vary this much on the average from one profiling run to the next. (Sometimes it will vary more.)"

Also, possibly related, it might be worth noting that gprof doesn't profile multithreaded programs. You're better off using something like Sysprof or OProfile in such cases.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!