SSE optimized emulation of 64-bit integers

后端 未结 1 902
时光取名叫无心
时光取名叫无心 2021-02-08 09:33

For a hobby project I\'m working on, I need to emulate certain 64-bit integer operations on a x86 CPU, and it needs to be fast.

Currently, I\'m doing this via M

相关标签:
1条回答
  • 2021-02-08 10:23

    SSE2 has direct support for some 64-bit integer operations:

    Set both elements to 0:

    __m128i z = _mm_setzero_si128();
    

    Set both elements to 1:

    __m128i z = _mm_set1_epi64x(1);      // also works for variables.
    __m128i z = _mm_set_epi64x(hi, lo);  // elements can be different
    
    __m128i z = _mm_set_epi32(0,1,0,1);  // if any compilers refuse int64_t in 32-bit mode.  (None of the major ones do.)
    

    Set/load the low 64 bits, zero-extending to __m128i

    // supported even in 32-bit mode, and listed as an intrinsic for MOVQ
    // so it should be atomic on aligned integers.
    _mm_loadl_epi64((const __m128i*)p);     // movq or movsd 64-bit load
    
    _mm_cvtsi64x_si128(a);      // only ICC, others refuse in 32-bit mode
    _mm_loadl_epi64((const __m128i*)&a);  // portable for a value instead of pointer
    

    Things based on _mm_set_epi32 can get compiled into a mess by some compilers, so _mm_loadl_epi64 appears to be the best bet across MSVC and ICC as well as gcc/clang, and should actually be safe for your requirement of atomic 64-bit loads in 32-bit mode. See it on the Godbolt compiler explorer

    Vertically add/subtract each 64-bit integer:

    __m128i z = _mm_add_epi64(x,y)
    __m128i z = _mm_sub_epi64(x,y)
    

    http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_integer_arithmetic.htm#intref_sse2_integer_arithmetic

    Left Shift:

    __m128i z = _mm_slli_epi64(x,i)   // i must be an immediate
    

    http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_int_shift.htm

    Bitwise operators:

    __m128i z = _mm_and_si128(x,y)
    __m128i z = _mm_or_si128(x,y)
    

    http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_integer_logical.htm

    SSE doesn't have increments, so you'll have to use a constant with 1.


    Comparisons are harder since there's no 64-bit support until SSE4.1 pcmpeqq and SSE4.2 pcmpgtq

    Here's the one for equality:

    __m128i t = _mm_cmpeq_epi32(a,b);
    __m128i z = _mm_and_si128(t,_mm_shuffle_epi32(t,177));
    

    This will set the each 64-bit element to 0xffffffffffff (aka -1) if they are equal. If you want it as a 0 or 1 in an int, you can pull it out using _mm_cvtsi32_si128() and add 1. (But sometimes you can do total -= cmp_result; instead of converting and adding.)

    And Less-Than: (not fully tested)

    a = _mm_xor_si128(a,_mm_set1_epi32(0x80000000));
    b = _mm_xor_si128(b,_mm_set1_epi32(0x80000000));
    __m128i t = _mm_cmplt_epi32(a,b);
    __m128i u = _mm_cmpgt_epi32(a,b);
    __m128i z = _mm_or_si128(t,_mm_shuffle_epi32(t,177));
    z = _mm_andnot_si128(_mm_shuffle_epi32(u,245),z);
    

    This will set the each 64-bit element to 0xffffffffffff if the corresponding element in a is less than b.


    Here's are versions of "equals" and "less-than" that return a bool. They return the result of the comparison for the bottom 64-bit integer.

    inline bool equals(__m128i a,__m128i b){
        __m128i t = _mm_cmpeq_epi32(a,b);
        __m128i z = _mm_and_si128(t,_mm_shuffle_epi32(t,177));
        return _mm_cvtsi128_si32(z) & 1;
    }
    inline bool lessthan(__m128i a,__m128i b){
        a = _mm_xor_si128(a,_mm_set1_epi32(0x80000000));
        b = _mm_xor_si128(b,_mm_set1_epi32(0x80000000));
        __m128i t = _mm_cmplt_epi32(a,b);
        __m128i u = _mm_cmpgt_epi32(a,b);
        __m128i z = _mm_or_si128(t,_mm_shuffle_epi32(t,177));
        z = _mm_andnot_si128(_mm_shuffle_epi32(u,245),z);
        return _mm_cvtsi128_si32(z) & 1;
    }
    
    0 讨论(0)
提交回复
热议问题