Speed difference between using int and unsigned int when mixed with doubles

后端 未结 4 1990
鱼传尺愫
鱼传尺愫 2021-01-01 14:22

I have an application where part of the inner loop was basically:

double sum = 0;
for (int i = 0; i != N; ++i, ++data, ++x) sum += *data * x;
4条回答
  •  囚心锁ツ
    2021-01-01 14:50

    Here's why: many common architectures (including x86) have a hardware instruction to convert signed int to doubles, but do not have a hardware conversion from unsigned to double, so the compiler needs to synthesize the conversion in software. Furthermore, the only unsigned multiply on Intel is a full width multiply, whereas signed multiplies can use the signed multiply low instruction.

    GCC's software conversion from unsigned int to double may very well be suboptimal (it almost certainly is, given the magnitude of the slowdown that you observed), but it is expected behavior for the code to be faster when using signed integers.

    Assuming a smart compiler, the difference should be much smaller on a 64-bit system, because a 64-bit signed integer -> double conversion can be used to efficiently do a 32-bit unsigned conversion.

    Edit: to illustrate, this:

    sum += *data * x;
    

    if the integer variables are signed, should compile into something along these lines:

    mov       (data),   %eax
    imul      %ecx,     %eax
    cvtsi2sd  %eax,     %xmm1
    addsd     %xmm1,    %xmm0
    

    on the other hand, if the integer variables are unsigned, cvtsi2sd can't be used to do the conversion, so a software workaround is required. I would expect to see something like this:

        mov       (data),   %eax
        mul       %ecx            // might be slower than imul
        cvtsi2sd  %eax,     %xmm1 // convert as though signed integer
        test      %eax,     %eax  // check if high bit was set
        jge       1f              // if it was, we need to adjust the converted
        addsd     (2^32),   %xmm1 // value by adding 2^32
    1:  addsd     %xmm1,    %xmm0
    

    That would be "acceptable" codegen for the unsigned -> double conversion; it could easily be worse.

    All of this is assuming floating-point code generation to SSE (I believe this is the default on the Ubuntu tools, but I could be wrong).

提交回复
热议问题