C++ Adding 2 arrays together quickly

前端未结

关注

 6  1692

一整个雨季 2021-02-04 16:57

Given the arrays:

int canvas[10][10];
int addon[10][10];

Where all the values range from 0 - 100, what is the fastest way in C++ to add

6条回答

南笙 (楼主)

2021-02-04 17:34

Here is an SSE4 implementation that should perform pretty well on Nehalem (Core i7):

#include #include #include static inline int canvas_add(int canvas[10][10], int addon[10][10]) { __m128i * cp = (__m128i *)&canvas[0][0]; const __m128i * ap = (__m128i *)&addon[0][0]; const __m128i vlimit = _mm_set1_epi32(100); __m128i vmax = _mm_set1_epi32(INT_MIN); __m128i vcmp; int cmp; int i; for (i = 0; i < 10 * 10; i += 4) { __m128i vc = _mm_loadu_si128(cp); __m128i va = _mm_loadu_si128(ap); vc = _mm_add_epi32(vc, va); vmax = _mm_max_epi32(vmax, vc); // SSE4 * _mm_storeu_si128(cp, vc); cp++; ap++; } vcmp = _mm_cmpgt_epi32(vmax, vlimit); // SSE4 * cmp = _mm_testz_si128(vcmp, vcmp); // SSE4 * return cmp == 0; }

Compile with gcc -msse4.1 ... or equivalent for your particular development environment.

For older CPUs without SSE4 (and with much more expensive misaligned loads/stores) you'll need to (a) use a suitable combination of SSE2/SSE3 intrinsics to replace the SSE4 operations (marked with an * above) and ideally (b) make sure your data is 16-byte aligned and use aligned loads/stores (_mm_load_si128/_mm_store_si128) in place of _mm_loadu_si128/_mm_storeu_si128.

0 讨论(0)

查看其它6个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复