In general, the method I use is this.
I'm not so much interested in timing pieces of the code as in finding big unnecessary time-takers so I can clean them out and accomplish speedup.
It's really a different process.
ADDED: If I can elaborate, typical performance problems I see are that some activity (which is nearly always a function call) is consuming some fraction of time, like 10%, 50%, 90%, whatever, and it is not really necessary - it can be replaced with something else or not done at all, and that amount of time will be saved.
Suppose for illustration it's 50%.
I take random-time samples of the call stack, 10 for example, and that call has a 50% chance of appearing on each one, so it will be on roughly half of the samples. Thus it will attract my attention, and I will look to see if what it is doing is really necessary, and if not, I will fix it to get the speedup.
Now, was that measuring? If so, it was really poor measurement, because the number of samples was so small. If 5 out of 10 samples showed the call, the fraction of time is probably around 50%, give or take, and it's definitely more than 10%. So I may not know the percent with precision, but I definitely know it is worth fixing, and I definitely know exactly where the problem is.
(Side note: I did not count the number of calls, or estimate the call duration. Rather, I estimated the cost of the call, which is what removing it would save, which is its fractional residence time on the stack. Also notice that I am working at the call level, not the function level. I may care what function calls are above and below the call of interest, but other than that, function-level issues, such as exclusive time, call graphs, and recursion, play no part.)
That's why I say measuring performance, and finding performance problems, while they may be complementary, are really different tasks.