Speed difference between using int and unsigned int when mixed with doubles

后端未结

关注

 4  1993

鱼传尺愫 2021-01-01 14:22

I have an application where part of the inner loop was basically:

double sum = 0;
for (int i = 0; i != N; ++i, ++data, ++x) sum += *data * x;

4条回答

囚心锁ツ (楼主)

2021-01-01 14:50
Here's why: many common architectures (including x86) have a hardware instruction to convert signed int to doubles, but do not have a hardware conversion from unsigned to double, so the compiler needs to synthesize the conversion in software. Furthermore, the only unsigned multiply on Intel is a full width multiply, whereas signed multiplies can use the signed multiply low instruction.

GCC's software conversion from unsigned int to double may very well be suboptimal (it almost certainly is, given the magnitude of the slowdown that you observed), but it is expected behavior for the code to be faster when using signed integers.

Assuming a smart compiler, the difference should be much smaller on a 64-bit system, because a 64-bit signed integer -> double conversion can be used to efficiently do a 32-bit unsigned conversion.

Edit: to illustrate, this:
```
sum += *data * x;
```
if the integer variables are signed, should compile into something along these lines:
```
mov       (data),   %eax
imul      %ecx,     %eax
cvtsi2sd  %eax,     %xmm1
addsd     %xmm1,    %xmm0
```
on the other hand, if the integer variables are unsigned, cvtsi2sd can't be used to do the conversion, so a software workaround is required. I would expect to see something like this:
```
    mov       (data),   %eax
    mul       %ecx            // might be slower than imul
    cvtsi2sd  %eax,     %xmm1 // convert as though signed integer
    test      %eax,     %eax  // check if high bit was set
    jge       1f              // if it was, we need to adjust the converted
    addsd     (2^32),   %xmm1 // value by adding 2^32
1:  addsd     %xmm1,    %xmm0
```
That would be "acceptable" codegen for the unsigned -> double conversion; it could easily be worse.

All of this is assuming floating-point code generation to SSE (I believe this is the default on the Ubuntu tools, but I could be wrong).
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...