I am experimenting with OpenMP. I wrote some code to check its performance. On a 4-core single Intel CPU with Kubuntu 11.04, the following program compiled with OpenMP is around
Fastest code:
for (int i = 0; i < 100000000; i ++) {;}
Slightly slower code:
#pragma omp parallel for num_threads(1)
for (int i = 0; i < 100000000; i ++) {;}
2-3 times slower code:
#pragma omp parallel for
for (int i = 0; i < 100000000; i ++) {;}
no matter what it is in between { and }. A simple ; or a more complex computation, same results. I compiled under Ubuntu 13.10 64-bit, using both gcc and g++, trying different parameters -ansi -pedantic-errors -Wall -Wextra -O3, and running on an Intel quad-core 3.5GHz.
I guess thread management overhead is at fault? It doens't seem smart for OMP to create a thread everytime you need one and destroy it after. I thought there would be four (or eight) threads being either running whenever needed or sleeping.