automated assembly loop level profiling

和自甴很熟 提交于 2019-12-12 09:01:48

问题


Does anyone know any assembly loop level profiler?

I have been using gprof but gprof hides loops and it is function level profiling, yet to optimize my code i want something to go to the loop level. I want it to be automated and just give me the output like gprof. I was recommended to go to dtrace yet I have no idea were to start. anyone can direct me in anyway? for example

main:

pushl   %ebp     
movl    %esp, %ebp     
subl    $16, %esp     
movl    $5000000, -4(%ebp)     
movl    $0, -12(%ebp)     
movl    $0, -8(%ebp)    
jmp .L2 

.L3:   

 movl    -8(%ebp), %eax    
 addl    %eax, -12(%ebp)    
 addl    $1, -8(%ebp) 

.L2:    

movl    -8(%ebp), %eax    
cmpl    -4(%ebp), %eax    
jl  .L3     
movl    $0, %eax    
leave     ret 

for example in gprof it would say main executed 1 time and foo executed 100 times. yet I want to know if L2, or L3 executed 1M times then my concentration on optimizing would be here. if my question is vague please ask me to explain more Thanks


回答1:


I suggest using Callgrind (one of the Valgrind tools, and usually installed with it). This can gather statistics on a much more fine-grained level, and the kcachegrind tool is very good for visualising the results.




回答2:


It depends on what OS you are using, but for this kind of profiling you generally want to use a sampling profiler rather than an instrumented profiler, e.g.

  • Linux: Zoom
  • Mac OS X: Instruments
  • Windows: VTune



回答3:


If you're on Linux, Zoom is an excellent choice.

If you're on Windows, LTProf might be able to do it.

On any platform, the low-tech method random-pausing can be relied on.

Don't look for how many times instructions are executed. Look for where the program counter is found a large fraction of the time. (They're not the same thing.) That will tell you where to concentrate your optimization efforts.




回答4:


KCachegrind gives profiling information for each line of source code (see this screenshot), and this includes CPU time, cache misses, etc... It saved my day a couple of times.

However running the code inside the profiler is extremely slow (tens of times slower than native).



来源:https://stackoverflow.com/questions/4592335/automated-assembly-loop-level-profiling

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!