I am experimenting with OpenMP. I wrote some code to check its performance. On a 4-core single Intel CPU with Kubuntu 11.04, the following program compiled with OpenMP is around
The problem is that the variable k is considered to be a shared variable, so it has to be synced between the threads. A possible solution to avoid this is:
#include
#include
using namespace std;
int main ()
{
long double i=0;
#pragma omp parallel for reduction(+:i)
for(int t=1; t<30000000; t++){
long double k=0.7;
for(int n=1; n<16; n++){
i=i+pow(k,n);
}
}
cout << i<<"\t";
return 0;
}
Following the hint of Martin Beckett in the comment below, instead of declaring k inside the loop, you can also declare k const and outside the loop.
Otherwise, ejd is correct - the problem here does not seem bad parallelization, but bad optimization when the code is parallelized. Remember that the OpenMP implementation of gcc is pretty young and far from optimal.