Plain C++ Code 10 times faster than inline assembler. Why?

后端 未结 2 580
没有蜡笔的小新
没有蜡笔的小新 2021-02-07 23:05

These two code snippets do the same thing: Adding two float arrays together and storing the result back into them.

Inline Assembler:

void vecAdd_SSE(floa         


        
相关标签:
2条回答
  • 2021-02-07 23:25

    You aren't really calling a function that executes one SSE instruction, are you? There's non-trivial overhead involved in setting up the xmm registers, and you're copying the values from memory to the registers and back, which will take far longer than the actual calculation.

    I wouldn't be at all surprised to find that the compiler inlines the C++ version of the function, but doesn't (can't, really) do the same for functions that contain inline assembly.

    0 讨论(0)
  • 2021-02-07 23:35

    On my machine (VS2015 64-bit mode), the compiler inlines vecAdd_Std and produces

    00007FF625921C8F  vmovups     xmm1,xmmword ptr [__xmm@4100000040c000004080000040000000 (07FF625929D60h)]  
    00007FF625921C97  vmovups     xmm4,xmm1  
    00007FF625921C9B  vcvtss2sd   xmm1,xmm1,xmm4  
    

    Test code

    int main() {
        float x[4] = {1.0, 2.0, 3.0, 4.0};
        float y[4] = {1.0, 2.0, 3.0, 4.0};
    
        vecAdd_Std(x, y);
    
        std::cout << x[0];
    }
    
    0 讨论(0)
提交回复
热议问题