Shift a __m128i of n bits

前端 未结 1 777
暖寄归人
暖寄归人 2021-02-19 16:39

I have a __m128i variable and I need to shift its 128 bit value of n bits, i.e. like _mm_srli_si128 and _mm_slli_si128 work, but on bits i

1条回答
  •  迷失自我
    2021-02-19 16:46

    This is the best that I could come up with for left/right immediate shifts with SSE2:

    #include 
    #include 
    
    #define SHL128(v, n) \
    ({ \
        __m128i v1, v2; \
     \
        if ((n) >= 64) \
        { \
            v1 = _mm_slli_si128(v, 8); \
            v1 = _mm_slli_epi64(v1, (n) - 64); \
        } \
        else \
        { \
            v1 = _mm_slli_epi64(v, n); \
            v2 = _mm_slli_si128(v, 8); \
            v2 = _mm_srli_epi64(v2, 64 - (n)); \
            v1 = _mm_or_si128(v1, v2); \
        } \
        v1; \
    })
    
    #define SHR128(v, n) \
    ({ \
        __m128i v1, v2; \
     \
        if ((n) >= 64) \
        { \
            v1 = _mm_srli_si128(v, 8); \
            v1 = _mm_srli_epi64(v1, (n) - 64); \
        } \
        else \
        { \
            v1 = _mm_srli_epi64(v, n); \
            v2 = _mm_srli_si128(v, 8); \
            v2 = _mm_slli_epi64(v2, 64 - (n)); \
            v1 = _mm_or_si128(v1, v2); \
        } \
        v1; \
    })
    
    int main(void)
    {
        __m128i va = _mm_setr_epi8(0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f);
        __m128i vb, vc;
    
        vb = SHL128(va, 4);
        vc = SHR128(va, 4);
    
        printf("va = %02vx\n", va);
        printf("vb = %02vx\n", vb);
        printf("vc = %02vx\n", vc);
        printf("\n");
    
        vb = SHL128(va, 68);
        vc = SHR128(va, 68);
    
        printf("va = %02vx\n", va);
        printf("vb = %02vx\n", vb);
        printf("vc = %02vx\n", vc);
    
        return 0;
    }
    

    Test:

    $ gcc -Wall -msse2 shift128.c && ./a.out
    va = 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
    vb = 00 10 20 30 40 50 60 70 80 90 a0 b0 c0 d0 e0 f0
    vc = 10 20 30 40 50 60 70 80 90 a0 b0 c0 d0 e0 f0 00
    
    va = 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
    vb = 00 00 00 00 00 00 00 00 00 10 20 30 40 50 60 70
    vc = 90 a0 b0 c0 d0 e0 f0 00 00 00 00 00 00 00 00 00
    $ 
    

    Note that the SHL128/SHR128 macros are implemented using a gcc extension supported by gcc, clang and some other compilers, but these will need to be adapted if your compiler does not support this extension.

    Note also that the printf extension for SIMD types used in the test harness works with Apple gcc, clang, et al, but again if your compiler does not support this and you want to test the code you'll need to implement your own SIMD print routines.

    Note on performance - the if/else branch will get optimised out so long as n is a compile-time constant (which it needs to be anyway for the shift intrinsics) so you have 2 instructions for the n >= 64 case and 4 instructions for the n < 64 case.

    0 讨论(0)
提交回复
热议问题