I\'m trying to optimize a computation-intensive algorithm and am kind of stuck at some cache problem. I have a huge buffer which is written occasionally and at random and read o
Shouldn't func4 be this:
void func4() { __m128 buf = _mm_setr_ps(5.0f, 5.0f, 5.0f, 5.0f); for(int i = 0; i < length; i += 16) { _mm_stream_ps(&arr[i], buf); _mm_stream_ps(&arr[i+4], buf); _mm_stream_ps(&arr[i+8], buf); _mm_stream_ps(&arr[i+12], buf); } }