Plain C++ Code 10 times faster than inline assembler. Why?

后端 未结 2 577
没有蜡笔的小新
没有蜡笔的小新 2021-02-07 23:05

These two code snippets do the same thing: Adding two float arrays together and storing the result back into them.

Inline Assembler:

void vecAdd_SSE(floa         


        
2条回答
  •  生来不讨喜
    2021-02-07 23:25

    You aren't really calling a function that executes one SSE instruction, are you? There's non-trivial overhead involved in setting up the xmm registers, and you're copying the values from memory to the registers and back, which will take far longer than the actual calculation.

    I wouldn't be at all surprised to find that the compiler inlines the C++ version of the function, but doesn't (can't, really) do the same for functions that contain inline assembly.

提交回复
热议问题