I want to implement the equivalent of C\'s uint
-to-double
cast in the GHC Haskell compiler. We already implement int
-to-double
We already implement int-to-double using FILD ...
Is there unsigned versions of these operations
If you want exactly x87 FILD opcode to use, just shift uint64 to uint63 (div 2) and then mul it by 2 back, but already as double, so the x87 uint64-to-double conversion requires one FMUL execution in overhead.
The example: 0xFFFFFFFFFFFFFFFFU -> +1.8446744073709551e+0019
it was unable to post the code example in the strict form rules. I'll try later.
//inline
double u64_to_d(unsigned _int64 v){
//volatile double res;
volatile unsigned int tmp=2;
_asm{
fild dword ptr tmp
//v>>=1;
shr dword ptr v+4, 1
rcr dword ptr v, 1
fild qword ptr v
//save lsb
//mov byte ptr tmp, 0
//rcl byte ptr tmp, 1
//res=tmp+res*2;
fmulp st(1),st
//fild dword ptr tmp
//faddp st(1),st
//fstp qword ptr res
}
//return res;
//fld qword ptr res
}
VC produced x86 output
//inline
double u64_to_d(unsigned _int64 v){
55 push ebp
8B EC mov ebp,esp
81 EC 04 00 00 00 sub esp,04h
//volatile double res;
volatile unsigned int tmp=2;
C7 45 FC 02 00 00 00 mov dword ptr [tmp], 2
_asm{
fild dword ptr tmp
DB 45 FC fild dword ptr [tmp]
//v>>=1;
shr dword ptr v+4, 1
D1 6D 0C shr dword ptr [ebp+0Ch],1
rcr dword ptr v, 1
D1 5D 08 rcr dword ptr [v],1
fild qword ptr v
DF 6D 08 fild qword ptr [v]
//save lsb
// mov byte ptr [tmp], 0
//C6 45 FC 00 mov byte ptr [tmp], 0
// rcl byte ptr tmp, 1
//D0 55 FC rcl byte ptr [tmp],1
//res=tmp+res*2;
fmulp st(1),st
DE C9 fmulp st(1),st
// fild dword ptr tmp
//DB 45 FC fild dword ptr [tmp]
// faddp st(1),st
//DE C1 faddp st(1),st
//fstp qword ptr res
//fstp qword ptr [res]
}
//return res;
//fld qword ptr [res]
8B E5 mov esp,ebp
5D pop ebp
C3 ret
}
i posted (probably i manually removed all incorrected ascii chars in text file).
As someone said, "Good Artists Copy; Great Artists Steal". So we can just check how other compiler writers solved this issue. I used a simple snippet:
volatile unsigned int x;
int main()
{
volatile double y = x;
return y;
}
(volatiles added to ensure the compiler does not optimize out the conversions)
Results (irrelevant instructions skipped):
__real@41f0000000000000 DQ 041f0000000000000r ; 4.29497e+009
mov eax, DWORD PTR ?x@@3IC ; x
fild DWORD PTR ?x@@3IC ; x
test eax, eax
jns SHORT $LN4@main
fadd QWORD PTR __real@41f0000000000000
$LN4@main:
fstp QWORD PTR _y$[esp+8]
So basically the compiler is adding an adjustment value in case the sign bit was set.
mov eax, DWORD PTR ?x@@3IC ; x
pxor xmm0, xmm0
cvtsi2sd xmm0, rax
movsdx QWORD PTR y$[rsp], xmm0
No need to adjust here because the compiler knows that rax
will have the sign bit cleared.
__xmm@41f00000000000000000000000000000 DB 00H, 00H, 00H, 00H, 00H, 00H, 00H
DB 00H, 00H, 00H, 00H, 00H, 00H, 00H, 0f0H, 'A'
mov eax, DWORD PTR ?x@@3IC ; x
movd xmm0, eax
cvtdq2pd xmm0, xmm0
shr eax, 31 ; 0000001fH
addsd xmm0, QWORD PTR __xmm@41f00000000000000000000000000000[eax*8]
movsd QWORD PTR _y$[esp+8], xmm0
This uses branchless code to add 0 or the magic adjustment depending on whether the sign bit was cleared or set.
You can exploit some of the properties of the IEEE double format and interpret the unsigned value as part of the mantissa, while adding some carefully crafted exponent.
Bits 63 62-52 51-0
S Exp Mantissa
0 1075 20 bits 0, followed by your unsigned int
The 1075 comes from the IEEE exponent bias (1023) for doubles and a "shift" amount of 52 bits for your mantissa. Note that there is a implicit "1" leading the mantissa, which needs to be subtracted later.
So:
double uint32_to_double(uint32_t x) {
uint64_t xx = x;
xx += 1075ULL << 52; // add the exponent
double d = *(double*)&xx; // or use a union to convert
return d - (1ULL << 52); // 2 ^^ 52
}
If you don't have native 64 bit on you platform a version using SSE for the integer steps might be beneficial, but that depends of course.
On my platform this compiles to
0000000000000000 <uint32_to_double>:
0: 48 b8 00 00 00 00 00 movabs $0x4330000000000000,%rax
7: 00 30 43
a: 89 ff mov %edi,%edi
c: 48 01 f8 add %rdi,%rax
f: c4 e1 f9 6e c0 vmovq %rax,%xmm0
14: c5 fb 5c 05 00 00 00 vsubsd 0x0(%rip),%xmm0,%xmm0
1b: 00
1c: c3 retq
which looks pretty good. The 0x0(%rip)
is the magic double constant, and if inlined some instructions like the upper 32 bit zeroing and the constant reload will vanish.
If I'm understanding you correctly you should be able to move your 32-bit uint to a temp area on stack, zero out the next dword, then use fild qword ptr to load the now 64-bit unsigned integer as a double.
There is a better way
__m128d _mm_cvtsu32_sd(__m128i n) {
const __m128i magic_mask = _mm_set_epi32(0, 0, 0x43300000, 0);
const __m128d magic_bias = _mm_set_sd(4503599627370496.0);
return _mm_sub_sd(_mm_castsi128_pd(_mm_or_si128(n, magic_mask)), magic_bias);
}