What is the fastest way to convert float to int on x86

前端 未结 10 2196
轻奢々
轻奢々 2020-11-28 11:26

What is the fastest way you know to convert a floating-point number to an int on an x86 CPU. Preferrably in C or assembly (that can be in-lined in C) for any combination of

相关标签:
10条回答
  • 2020-11-28 11:42

    It depends on if you want a truncating conversion or a rounding one and at what precision. By default, C will perform a truncating conversion when you go from float to int. There are FPU instructions that do it but it's not an ANSI C conversion and there are significant caveats to using it (such as knowing the FPU rounding state). Since the answer to your problem is quite complex and depends on some variables you haven't expressed, I recommend this article on the issue:

    http://www.stereopsis.com/FPU.html

    0 讨论(0)
  • 2020-11-28 11:44

    There is one instruction to convert a floating point to an int in assembly: use the FISTP instruction. It pops the value off the floating-point stack, converts it to an integer, and then stores at at the address specified. I don't think there would be a faster way (unless you use extended instruction sets like MMX or SSE, which I am not familiar with).

    Another instruction, FIST, leaves the value on the FP stack but I'm not sure it works with quad-word sized destinations.

    0 讨论(0)
  • 2020-11-28 11:48

    I assume truncation is required, same as if one writes i = (int)f in "C".

    If you have SSE3, you can use:

    int convert(float x)
    {
        int n;
        __asm {
            fld x
            fisttp n // the extra 't' means truncate
        }
        return n;
    }
    

    Alternately, with SSE2 (or in x64 where inline assembly might not be available), you can use almost as fast:

    #include <xmmintrin.h>
    int convert(float x)
    {
        return _mm_cvtt_ss2si(_mm_load_ss(&x)); // extra 't' means truncate
    }
    

    On older computers there is an option to set the rounding mode manually and perform conversion using the ordinary fistp instruction. That will probably only work for arrays of floats, otherwise care must be taken to not use any constructs that would make the compiler change rounding mode (such as casting). It is done like this:

    void Set_Trunc()
    {
        // cw is a 16-bit register [_ _ _ ic rc1 rc0 pc1 pc0 iem _ pm um om zm dm im]
        __asm {
            push ax // use stack to store the control word
            fnstcw word ptr [esp]
            fwait // needed to make sure the control word is there
            mov ax, word ptr [esp] // or pop ax ...
            or ax, 0xc00 // set both rc bits (alternately "or ah, 0xc")
            mov word ptr [esp], ax // ... and push ax
            fldcw word ptr [esp]
            pop ax
        }
    }
    
    void convertArray(int *dest, const float *src, int n)
    {
        Set_Trunc();
        __asm {
            mov eax, src
            mov edx, dest
            mov ecx, n // load loop variables
    
            cmp ecx, 0
            je bottom // handle zero-length arrays
    
        top:
            fld dword ptr [eax]
            fistp dword ptr [edx]
            loop top // decrement ecx, jump to top
        bottom:
        }
    }
    

    Note that the inline assembly only works with Microsoft's Visual Studio compilers (and maybe Borland), it would have to be rewritten to GNU assembly in order to compile with gcc. The SSE2 solution with intrinsics should be quite portable, however.

    Other rounding modes are possible by different SSE2 intrinsics or by manually setting the FPU control word to a different rounding mode.

    0 讨论(0)
  • 2020-11-28 11:52

    Packed conversion using SSE is by far the fastest method, since you can convert multiple values in the same instruction. ffmpeg has a lot of assembly for this (mostly for converting the decoded output of audio to integer samples); check it for some examples.

    0 讨论(0)
  • 2020-11-28 11:52

    If you really care about the speed of this make sure your compiler is generating the FIST instruction. In MSVC you can do this with /QIfist, see this MSDN overview

    You can also consider using SSE intrinsics to do the work for you, see this article from Intel: http://softwarecommunity.intel.com/articles/eng/2076.htm

    0 讨论(0)
  • 2020-11-28 11:54

    Since MS scews us out of inline assembly in X64 and forces us to use intrinsics, I looked up which to use. MSDN doc gives _mm_cvtsd_si64x with an example.

    The example works, but is horribly inefficient, using an unaligned load of 2 doubles, where we need just a single load, so getting rid of the additional alignment requirement. Then a lot of needless loads and reloads are produced, but they can be eliminated as follows:

     #include <intrin.h>
     #pragma intrinsic(_mm_cvtsd_si64x)
     long long _inline double2int(const double &d)
     {
         return _mm_cvtsd_si64x(*(__m128d*)&d);
     }
    

    Result:

            i=double2int(d);
    000000013F651085  cvtsd2si    rax,mmword ptr [rsp+38h]  
    000000013F65108C  mov         qword ptr [rsp+28h],rax  
    

    The rounding mode can be set without inline assembly, e.g.

        _control87(_RC_NEAR,_MCW_RC);
    

    where rounding to nearest is default (anyway).

    The question whether to set the rounding mode at each call or to assume it will be restored (third party libs) will have to be answered by experience, I guess. You will have to include float.h for _control87() and related constants.

    And, no, this will not work in 32 bits, so keep using the FISTP instruction:

    _asm fld d
    _asm fistp i
    
    0 讨论(0)
提交回复
热议问题