Why is 'add' taking so long in my application?

南笙酒味 提交于 2021-01-28 11:13:20

问题


I'm profiling an application using Intel VTune, and there is one particular hotspot where I'm copying a __m128i member variable in the copy constructor of a C++ class.

VTune gives this breakdown:

Instruction                  CPU Time: Total        CPU Time: Self

Block 1:
vmovdqa64x (%rax), %xmm0     4.1%                   0.760s
add $0x10, %rax              46.6%                  8.594s

Block 2:
vmovapsx %xmm0, -10x(%rdx)   6.5%                   1.204s

(If it matters, compiler is gcc 7.4.0)

I admit I'm an assembly noob, but it's very surprising that one particular add instruction is taking up 46% of my application time, given that the app is doing lots of other complex things and add is such a trivial operation.

Am I misinterpreting the profiling output somehow? Is there a path to optimize this other than "copy that variable less"?

来源:https://stackoverflow.com/questions/60232283/why-is-add-taking-so-long-in-my-application

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!