I have an application where part of the inner loop was basically:
double sum = 0;
for (int i = 0; i != N; ++i, ++data, ++x) sum += *data * x;
Here's why: many common architectures (including x86) have a hardware instruction to convert signed int to doubles, but do not have a hardware conversion from unsigned to double, so the compiler needs to synthesize the conversion in software. Furthermore, the only unsigned multiply on Intel is a full width multiply, whereas signed multiplies can use the signed multiply low instruction.
GCC's software conversion from unsigned int to double may very well be suboptimal (it almost certainly is, given the magnitude of the slowdown that you observed), but it is expected behavior for the code to be faster when using signed integers.
Assuming a smart compiler, the difference should be much smaller on a 64-bit system, because a 64-bit signed integer -> double conversion can be used to efficiently do a 32-bit unsigned conversion.
Edit: to illustrate, this:
sum += *data * x;
if the integer variables are signed, should compile into something along these lines:
mov (data), %eax
imul %ecx, %eax
cvtsi2sd %eax, %xmm1
addsd %xmm1, %xmm0
on the other hand, if the integer variables are unsigned, cvtsi2sd
can't be used to do the conversion, so a software workaround is required. I would expect to see something like this:
mov (data), %eax
mul %ecx // might be slower than imul
cvtsi2sd %eax, %xmm1 // convert as though signed integer
test %eax, %eax // check if high bit was set
jge 1f // if it was, we need to adjust the converted
addsd (2^32), %xmm1 // value by adding 2^32
1: addsd %xmm1, %xmm0
That would be "acceptable" codegen for the unsigned -> double conversion; it could easily be worse.
All of this is assuming floating-point code generation to SSE (I believe this is the default on the Ubuntu tools, but I could be wrong).