SIMD the following code

后端 未结 1 1857
执念已碎
执念已碎 2021-02-05 16:23

How do I SIMIDize the following code in C (using SIMD intrinsics of course)? I am having trouble understanding SIMD intrinsics and this would help a lot:

int sum         


        
1条回答
  •  一向
    一向 (楼主)
    2021-02-05 16:49

    Here's a fairly straightforward implementation (warning: untested code):

    int32_t sum_array(const int32_t a[], const int n)
    {
        __m128i vsum = _mm_set1_epi32(0);       // initialise vector of four partial 32 bit sums
        int32_t sum;
        int i;
    
        for (i = 0; i < n; i += 4)
        {
            __m128i v = _mm_load_si128(&a[i]);  // load vector of 4 x 32 bit values
            vsum = _mm_add_epi32(vsum, v);      // accumulate to 32 bit partial sum vector
        }
        // horizontal add of four 32 bit partial sums and return result
        vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 8));
        vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 4));
        sum = _mm_cvtsi128_si32(vsum);
        return sum;
    }
    

    Note that the input array, a[], needs to be 16 byte aligned, and n should be a multiple of 4.

    0 讨论(0)
提交回复
热议问题