Why does _mm_stream_ps produce L1/LL cache misses?

后端 未结 2 1671
轻奢々
轻奢々 2021-02-03 13:35

I\'m trying to optimize a computation-intensive algorithm and am kind of stuck at some cache problem. I have a huge buffer which is written occasionally and at random and read o

2条回答
  •  闹比i
    闹比i (楼主)
    2021-02-03 14:14

    Shouldn't func4 be this:

    void func4() {
        __m128 buf = _mm_setr_ps(5.0f, 5.0f, 5.0f, 5.0f);
        for(int i = 0; i < length; i += 16) {
            _mm_stream_ps(&arr[i], buf);
            _mm_stream_ps(&arr[i+4], buf);
            _mm_stream_ps(&arr[i+8], buf);
            _mm_stream_ps(&arr[i+12], buf);
        }
    }
    

提交回复
热议问题