I am attempting to speed up this for loop with OpenMP parallelization. I was under the impression that this should split up the work across a number of threads. However, p
Assuming you don't have a race condition you can try fusing the loops. Fusing will give larger chunks to parallelize which will help reduce the effect of false sharing and likely distribute the load better as well.
For a triple loop like this
for(int i2=0; i2<x; i2++) {
for(int j2=0; j2<y; j2++) {
for(int k2=0; k2<z; k2++) {
//
}
}
}
you can fuse it like this
#pragma omp parallel for
for(int n=0; n<(x*y*z); n++) {
int i2 = n/(y*z);
int j2 = (n%(y*z))/z;
int k2 = (n%(y*z))%z;
//
}
In your case you you can do it like this
int i, j, k, n;
int x = newNx%2 ? newNx/2+1 : newNx/2;
int y = newNy;
int z = newNz;
#pragma omp parallel for private(i, j, k)
for(n=0; n<(x*y*z); n++) {
i = 2*(n/(y*z)) + 1;
j = (n%(y*z))/z + 1;
k = (n%(y*z))%z + 1;
// rest of code
}
If this successfully speed up your code then you can feel good that you made your code faster and at the same time obfuscated it even further.