C++ Adding 2 arrays together quickly

前端 未结 6 1692
一整个雨季
一整个雨季 2021-02-04 16:57

Given the arrays:

int canvas[10][10];
int addon[10][10];

Where all the values range from 0 - 100, what is the fastest way in C++ to add

6条回答
  •  南笙
    南笙 (楼主)
    2021-02-04 17:34

    Here is an SSE4 implementation that should perform pretty well on Nehalem (Core i7):

    #include 
    #include 
    #include 
    
    static inline int canvas_add(int canvas[10][10], int addon[10][10])
    {
        __m128i * cp = (__m128i *)&canvas[0][0];
        const __m128i * ap = (__m128i *)&addon[0][0];
        const __m128i vlimit = _mm_set1_epi32(100);
        __m128i vmax = _mm_set1_epi32(INT_MIN);
        __m128i vcmp;
        int cmp;
        int i;
    
        for (i = 0; i < 10 * 10; i += 4)
        {
            __m128i vc = _mm_loadu_si128(cp);
            __m128i va = _mm_loadu_si128(ap);
    
            vc = _mm_add_epi32(vc, va);
            vmax = _mm_max_epi32(vmax, vc);   // SSE4 *
    
            _mm_storeu_si128(cp, vc);
    
            cp++;
            ap++;
        }
        vcmp = _mm_cmpgt_epi32(vmax, vlimit); // SSE4 *
        cmp = _mm_testz_si128(vcmp, vcmp);    // SSE4 *
        return cmp == 0;
    }
    

    Compile with gcc -msse4.1 ... or equivalent for your particular development environment.

    For older CPUs without SSE4 (and with much more expensive misaligned loads/stores) you'll need to (a) use a suitable combination of SSE2/SSE3 intrinsics to replace the SSE4 operations (marked with an * above) and ideally (b) make sure your data is 16-byte aligned and use aligned loads/stores (_mm_load_si128/_mm_store_si128) in place of _mm_loadu_si128/_mm_storeu_si128.

提交回复
热议问题