Why is OpenMP outperforming threads?

后端 未结 2 1708
余生分开走
余生分开走 2020-12-19 16:59

I\'ve been calling this in OpenMP

#pragma omp parallel for num_threads(totalThreads)
for(unsigned i=0; i

        
相关标签:
2条回答
  • 2020-12-19 17:22

    Where does totalThreads come from in your OpenMP version? I bet it's not startIndex.size().

    The OpenMP version queues the requests onto totalThreads worker threads. It looks like the C++11 version creates, startIndex.size() threads, which involves a ridiculous amount of overhead if that's a big number.

    0 讨论(0)
  • 2020-12-19 17:30

    Consider the following code. The OpenMP version runs in 0 seconds while the C++11 version runs in 50 seconds. This is not due to the function being doNothing, and it's not due to vector being within the loop. As you can imagine, the c++11 threads are created and then destroyed in each iteration. On the other hand, OpenMP actually implements threadpools. It's not in the standard, but it's in Intel's and AMD's implementations.

    for(int j=1; j<100000; ++j)
    {
        if(algorithmToRun == 1)
        {
            vector<thread> threads;
            for(int i=0; i<16; i++)
            {
                threads.push_back(thread(doNothing));
            }
            for(auto& thread : threads) thread.join();
        }
        else if(algorithmToRun == 2)
        {
            #pragma omp parallel for num_threads(16)
            for(unsigned i=0; i<16; i++)
            {
                doNothing();
            }
        }
    }
    
    0 讨论(0)
提交回复
热议问题