Are there unsigned equivalents of the x87 FILD and SSE CVTSI2SD instructions?

前端 未结 5 1360
离开以前
离开以前 2021-01-18 20:12

I want to implement the equivalent of C\'s uint-to-double cast in the GHC Haskell compiler. We already implement int-to-double

相关标签:
5条回答
  • 2021-01-18 20:53

    We already implement int-to-double using FILD ...
    Is there unsigned versions of these operations

    If you want exactly x87 FILD opcode to use, just shift uint64 to uint63 (div 2) and then mul it by 2 back, but already as double, so the x87 uint64-to-double conversion requires one FMUL execution in overhead.

    The example: 0xFFFFFFFFFFFFFFFFU -> +1.8446744073709551e+0019

    it was unable to post the code example in the strict form rules. I'll try later.

        //inline
        double    u64_to_d(unsigned _int64 v){
    
        //volatile double   res;
        volatile unsigned int tmp=2;
        _asm{
        fild  dword ptr tmp
        //v>>=1;
        shr   dword ptr v+4, 1
        rcr   dword ptr v, 1
        fild  qword ptr v
    
        //save lsb
        //mov   byte ptr tmp, 0  
        //rcl   byte ptr tmp, 1
    
        //res=tmp+res*2;
        fmulp st(1),st
        //fild  dword ptr tmp
        //faddp st(1),st 
    
        //fstp  qword ptr res
        }
    
        //return res;
        //fld  qword ptr res
    }

    VC produced x86 output

            //inline
            double    u64_to_d(unsigned _int64 v){
        55                   push        ebp  
        8B EC                mov         ebp,esp  
        81 EC 04 00 00 00    sub         esp,04h  
    
            //volatile double   res;
            volatile unsigned int tmp=2;
        C7 45 FC 02 00 00 00 mov         dword ptr [tmp], 2  
            _asm{
            fild  dword ptr tmp
        DB 45 FC             fild        dword ptr [tmp]  
            //v>>=1;
            shr   dword ptr v+4, 1
        D1 6D 0C             shr         dword ptr [ebp+0Ch],1  
            rcr   dword ptr v, 1
        D1 5D 08             rcr         dword ptr [v],1  
            fild  qword ptr v
        DF 6D 08             fild        qword ptr [v]  
    
            //save lsb
        //    mov   byte ptr [tmp], 0  
        //C6 45 FC 00        mov         byte ptr [tmp], 0
        //    rcl   byte ptr tmp, 1
        //D0 55 FC           rcl         byte ptr [tmp],1  
    
            //res=tmp+res*2;
            fmulp st(1),st
        DE C9                fmulp       st(1),st  
        //    fild  dword ptr tmp
        //DB 45 FC           fild        dword ptr [tmp]  
        //    faddp st(1),st 
        //DE C1              faddp       st(1),st  
    
    
            //fstp  qword ptr res
            //fstp        qword ptr [res]  
        }
    
            //return res;
            //fld         qword ptr [res]  
    
        8B E5                mov         esp,ebp  
        5D                   pop         ebp  
        C3                   ret  
    }

    i posted (probably i manually removed all incorrected ascii chars in text file).

    0 讨论(0)
  • 2021-01-18 21:00

    As someone said, "Good Artists Copy; Great Artists Steal". So we can just check how other compiler writers solved this issue. I used a simple snippet:

    volatile unsigned int x;
    int main()
    {
      volatile double  y = x;
      return y;
    }
    

    (volatiles added to ensure the compiler does not optimize out the conversions)

    Results (irrelevant instructions skipped):

    Visual C++ 2010 cl /Ox (x86)

      __real@41f0000000000000 DQ 041f0000000000000r ; 4.29497e+009
    
      mov   eax, DWORD PTR ?x@@3IC          ; x
      fild  DWORD PTR ?x@@3IC           ; x
      test  eax, eax
      jns   SHORT $LN4@main
      fadd  QWORD PTR __real@41f0000000000000
    $LN4@main:
      fstp  QWORD PTR _y$[esp+8]
    

    So basically the compiler is adding an adjustment value in case the sign bit was set.

    Visual C++ 2010 cl /Ox (x64)

      mov   eax, DWORD PTR ?x@@3IC          ; x
      pxor  xmm0, xmm0
      cvtsi2sd xmm0, rax
      movsdx    QWORD PTR y$[rsp], xmm0
    

    No need to adjust here because the compiler knows that rax will have the sign bit cleared.

    Visual C++ 2012 cl /Ox

      __xmm@41f00000000000000000000000000000 DB 00H, 00H, 00H, 00H, 00H, 00H, 00H
      DB 00H, 00H, 00H, 00H, 00H, 00H, 00H, 0f0H, 'A'
    
      mov   eax, DWORD PTR ?x@@3IC          ; x
      movd  xmm0, eax
      cvtdq2pd xmm0, xmm0
      shr   eax, 31                 ; 0000001fH
      addsd xmm0, QWORD PTR __xmm@41f00000000000000000000000000000[eax*8]
      movsd QWORD PTR _y$[esp+8], xmm0
    

    This uses branchless code to add 0 or the magic adjustment depending on whether the sign bit was cleared or set.

    0 讨论(0)
  • 2021-01-18 21:05

    You can exploit some of the properties of the IEEE double format and interpret the unsigned value as part of the mantissa, while adding some carefully crafted exponent.

    Bits 63 62-52     51-0
         S  Exp       Mantissa
         0  1075      20 bits 0, followed by your unsigned int
    

    The 1075 comes from the IEEE exponent bias (1023) for doubles and a "shift" amount of 52 bits for your mantissa. Note that there is a implicit "1" leading the mantissa, which needs to be subtracted later.

    So:

    double uint32_to_double(uint32_t x) {
        uint64_t xx = x;
        xx += 1075ULL << 52;         // add the exponent
        double d = *(double*)&xx;    // or use a union to convert
        return d - (1ULL << 52);     // 2 ^^ 52
    }
    

    If you don't have native 64 bit on you platform a version using SSE for the integer steps might be beneficial, but that depends of course.

    On my platform this compiles to

    0000000000000000 <uint32_to_double>:
       0:   48 b8 00 00 00 00 00    movabs $0x4330000000000000,%rax
       7:   00 30 43 
       a:   89 ff                   mov    %edi,%edi
       c:   48 01 f8                add    %rdi,%rax
       f:   c4 e1 f9 6e c0          vmovq  %rax,%xmm0
      14:   c5 fb 5c 05 00 00 00    vsubsd 0x0(%rip),%xmm0,%xmm0 
      1b:   00 
      1c:   c3                      retq
    

    which looks pretty good. The 0x0(%rip) is the magic double constant, and if inlined some instructions like the upper 32 bit zeroing and the constant reload will vanish.

    0 讨论(0)
  • 2021-01-18 21:05

    If I'm understanding you correctly you should be able to move your 32-bit uint to a temp area on stack, zero out the next dword, then use fild qword ptr to load the now 64-bit unsigned integer as a double.

    0 讨论(0)
  • 2021-01-18 21:09

    There is a better way

    __m128d _mm_cvtsu32_sd(__m128i n) {
        const __m128i magic_mask = _mm_set_epi32(0, 0, 0x43300000, 0);
        const __m128d magic_bias = _mm_set_sd(4503599627370496.0);
        return _mm_sub_sd(_mm_castsi128_pd(_mm_or_si128(n, magic_mask)), magic_bias);
    }
    
    0 讨论(0)
提交回复
热议问题