I\'m trying to get my code to auto vectorize, but it isn\'t working.
int _tmain(int argc, _TCHAR* argv[])
{
const int N = 4096;
float x[N];
float
One problem could be that your stack allocation isn't necessarily aligned by your compiler. If your compiler supports c++11 you could use:
float x[N] alignas(16);
float y[N] alignas(16);
To explicitly get 16 byte aligned memory, which is required by most SSE operations.
EDIT:
Even if alignment isn't the issue and your compiler is vectorizing unaligned code you should make this optimization as unaligned SSE operations are very slow compared to their aligned counterparts.
The error 1305 happens because the optimizer did not vectorize the loop since the value sum
is not used. Simply adding printf("%d\n", sum)
fixes that. But then you get a new error code 1105 "Loop includes a non-recognized reduction operation". To fix this you need you need to set /fp:fast
The reason is that floating point arithmetic is not associative and reductions using SIMD or MIMD (i.e. using multiple threads) need to be associative. By using a looser floating point model you can do the reduction.
I just tested it with the following code and the default fp:precise
does not vectorize and when I use fp:fast
it does.
#include <stdio.h>
int main() {
const int N = 4096;
float x[N];
float y[N];
float sum = 0;
for (int i = 0; i < N; i++){
sum += x[i] * y[i];
}
printf("sum %f\n", sum);
}
In regards to your question about the loop with the rand()
function the rand()
function is not a SIMD function. It can't be vectorized. You need to find a SIMD rand() function. I don't know of one. An alternative is pre-compute an array of random numbers and use the array instead. In any case rand()
is a horrible random number generate and is only useful for some toy cases. Consider using the Mersenne twister PRNG.