Auto vectorization not working

后端 未结 2 627
名媛妹妹
名媛妹妹 2020-12-20 18:39

I\'m trying to get my code to auto vectorize, but it isn\'t working.

int _tmain(int argc, _TCHAR* argv[])
{
    const int N = 4096;
    float x[N];
    float         


        
相关标签:
2条回答
  • 2020-12-20 19:35

    One problem could be that your stack allocation isn't necessarily aligned by your compiler. If your compiler supports c++11 you could use:

    float x[N] alignas(16);
    float y[N] alignas(16);
    

    To explicitly get 16 byte aligned memory, which is required by most SSE operations.


    EDIT:

    Even if alignment isn't the issue and your compiler is vectorizing unaligned code you should make this optimization as unaligned SSE operations are very slow compared to their aligned counterparts.

    0 讨论(0)
  • 2020-12-20 19:37

    The error 1305 happens because the optimizer did not vectorize the loop since the value sum is not used. Simply adding printf("%d\n", sum) fixes that. But then you get a new error code 1105 "Loop includes a non-recognized reduction operation". To fix this you need you need to set /fp:fast

    The reason is that floating point arithmetic is not associative and reductions using SIMD or MIMD (i.e. using multiple threads) need to be associative. By using a looser floating point model you can do the reduction.

    I just tested it with the following code and the default fp:precise does not vectorize and when I use fp:fast it does.

    #include <stdio.h>
    int main() {
        const int N = 4096;
        float x[N];
        float y[N];
        float sum = 0;
        for (int i = 0; i < N; i++){
            sum += x[i] * y[i];
        }
        printf("sum %f\n", sum);
    }
    

    In regards to your question about the loop with the rand() function the rand() function is not a SIMD function. It can't be vectorized. You need to find a SIMD rand() function. I don't know of one. An alternative is pre-compute an array of random numbers and use the array instead. In any case rand() is a horrible random number generate and is only useful for some toy cases. Consider using the Mersenne twister PRNG.

    0 讨论(0)
提交回复
热议问题