Understanding VTune report

我与影子孤独终老i 提交于 2019-12-11 01:47:10

问题


this is a followup to an existing thread (http://stackoverflow.com/questions/12724887/caching-in-a-high-performance-financial-application) - I found that it's not the cache that hinders my application. To cut the long story short, I have an application which spends 70 percent of the runtime in one function (15 seconds out of 22). Hence, I would like to cut the runtime of this function as much as possible as the envisaged use of the function is for MUCH larger data (i.e. 22 seconds is not the planned runtime:)

The problem is that VTune's output puzzles me, the code seems to spend a huge deal of time in absolutely unexpected places. I have run out of ideas, so Im posting my project coupled with profiler results here.

Taking a look at the incriminated evaluateExits() function, these things puzzle me:

1/ the function happens to spend 2.2s on calling an inline function that returns 1 regardless of parameters (line 425, this->contractManager->contractCount()). Note: the version where the function returns 1 regardless of params is one of the possible cases so I can't put "contractCount=1" and leave it like that. Can the redirection from a virtual table pointer eat up those 2.2 seconds (contractCount() is a virtual method)?

2/ the function spends 3.3s on min(uint1, uint2) (line 432) despite Im using a version of wmin that should be as CPU friendly as possible.

3/ the function spends 1.6s on line 512 which is a very trivial operation and the function being called is not a virtual one..

So the questions are: why do these three lines of code take so much time? What am I overlooking? And how could I optimize my code to make it run faster? Should I replace the wmin() by a SSE version of min applied to whole arrays?

Any input is much appreciated. Daniel

EDIT: Taking a look into the assembly, I found that in the 1/ case it really is the vfptr that makes the code "slow". I replaced the call of a virtual function by a fastdelegate of Don Clugston's but no performance change whatsoever occured (I have no clue why). Due to Nightingale's comment the attachments should now contain all the files necessary. However, the binary cannot be run successfully, as it connects to shared memory where there are 100's of MB of data.

So, I attach the whole project coupled with VTune's results here and here


回答1:


Daniel,

I wanted to take a look at your VTune results but unfortunately you did not include the binary module for which the result was collected, so I couldn't look into the assembly that should be of greatest value here. Can you re-post your project archive with binary file and debug information file included?

I also attempted to re-build your sources, but a number of header files could not be found:

  • Some Qt headers (I don't have Qt installed and not an expert in doing that)
  • parameterHolder.h file
  • externFloatConsts.h file

So, in order to help it would be good to have these files or the binary that was used to collect the data.



来源:https://stackoverflow.com/questions/12826508/understanding-vtune-report

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!