Plain C++ Code 10 times faster than inline assembler. Why?

后端 未结 2 579
没有蜡笔的小新
没有蜡笔的小新 2021-02-07 23:05

These two code snippets do the same thing: Adding two float arrays together and storing the result back into them.

Inline Assembler:

void vecAdd_SSE(floa         


        
2条回答
  •  青春惊慌失措
    2021-02-07 23:35

    On my machine (VS2015 64-bit mode), the compiler inlines vecAdd_Std and produces

    00007FF625921C8F  vmovups     xmm1,xmmword ptr [__xmm@4100000040c000004080000040000000 (07FF625929D60h)]  
    00007FF625921C97  vmovups     xmm4,xmm1  
    00007FF625921C9B  vcvtss2sd   xmm1,xmm1,xmm4  
    

    Test code

    int main() {
        float x[4] = {1.0, 2.0, 3.0, 4.0};
        float y[4] = {1.0, 2.0, 3.0, 4.0};
    
        vecAdd_Std(x, y);
    
        std::cout << x[0];
    }
    

提交回复
热议问题