I\'ve been calling this in OpenMP
#pragma omp parallel for num_threads(totalThreads)
for(unsigned i=0; i
Where does totalThreads
come from in your OpenMP version? I bet it's not startIndex.size()
.
The OpenMP version queues the requests onto totalThreads
worker threads. It looks like the C++11 version creates, startIndex.size()
threads, which involves a ridiculous amount of overhead if that's a big number.
Consider the following code. The OpenMP version runs in 0 seconds while the C++11 version runs in 50 seconds. This is not due to the function being doNothing, and it's not due to vector being within the loop. As you can imagine, the c++11 threads are created and then destroyed in each iteration. On the other hand, OpenMP actually implements threadpools. It's not in the standard, but it's in Intel's and AMD's implementations.
for(int j=1; j<100000; ++j)
{
if(algorithmToRun == 1)
{
vector<thread> threads;
for(int i=0; i<16; i++)
{
threads.push_back(thread(doNothing));
}
for(auto& thread : threads) thread.join();
}
else if(algorithmToRun == 2)
{
#pragma omp parallel for num_threads(16)
for(unsigned i=0; i<16; i++)
{
doNothing();
}
}
}