I would like to profile my c++ application on linux. I would like to find out how much time my application spent on CPU processing vs time spent on block by IO/being idle.
google-perf-tools - much faster alternative to callgrind (and it can generate output with the same format as callgrind, so you can use KCacheGrind).
LTTng is a good tool to use for full system profiling.
callgrind is a very good tool but I found OProfile to me more 'complete'. Also, it is the only one that lets you specify module and/or kernel source to allow deeper insight into your bottlenecks. The output is supposed to be able to interface with KCacheGrind but I had trouble with that so I used Gprof2Dot instead. You can export your callgraph to a .png.
Edit:
OProfile looks at the overall system so the process will just be:
[setup oprofile]
opcontrol --init
opcontorl --vmlinux=/path/to/vmlinux (or --no-vmlinux)
opcontrol --start
[run your app here]
opcontrol --stop (or opcontrol --shutdown [man for difference]
then to start looking at the results look at the man page on opreport