Confused about firstprivate and threadprivate in OpenMP context

后端 未结 1 1288
鱼传尺愫
鱼传尺愫 2020-12-12 05:30

Say I have packed some resources in an object, and then perform some computation based on the resources. What I normally do is to initialise the objects outside the parallel

相关标签:
1条回答
  • 2020-12-12 05:58

    When an object is declared firstprivate, the copy constructor is called, whereas when private is used the default constructor is called. We'll address threadprivate below. Proof (Intel C++ 15.0):

    #include <iostream>
    #include <omp.h>
    
    class myclass {
        int _n;
    public:
        myclass(int n) : _n(n) { std::cout << "int c'tor\n"; }
    
        myclass() : _n(0) { std::cout << "def c'tor\n"; }
    
        myclass(const myclass & other) : _n(other._n)
        { std::cout << "copy c'tor\n"; }
    
        ~myclass() { std::cout << "bye bye\n"; }
    
        void print() { std::cout << _n << "\n"; }
    
        void add(int t) { _n += t; }
    };
    
    myclass globalClass;
    
    #pragma omp threadprivate (globalClass)
    
    int main(int argc, char* argv[])
    {
        std::cout << "\nBegninning main()\n";
    
        myclass inst(17);
    
        std::cout << "\nEntering parallel region #0 (using firstprivate)\n";
    #pragma omp parallel firstprivate(inst)
        {
            std::cout << "Hi\n";
        }
    
        std::cout << "\nEntering parallel region #1 (using private)\n";
    #pragma omp parallel private(inst)
        {
            std::cout << "Hi\n";
        }
    
        std::cout << "\nEntering parallel region #2 (printing the value of "
                        "the global instance(s) and adding the thread number)\n";
    #pragma omp parallel
        {
            globalClass.print();
            globalClass.add(omp_get_thread_num());
        }
    
        std::cout << "\nEntering parallel region #3 (printing the global instance(s))\n";
    #pragma omp parallel
        {
            globalClass.print();
        }
    
        std::cout << "\nAbout to leave main()\n";
        return 0;
    }
    

    gives

    def c'tor

    Begninning main()
    int c'tor

    Entering parallel region #0 (using firstprivate)
    copy c'tor
    Hi
    bye bye
    copy c'tor
    Hi
    bye bye
    copy c'tor
    Hi
    bye bye
    copy c'tor
    Hi
    bye bye

    Entering parallel region #1 (using private)
    def c'tor
    Hi
    bye bye
    def c'tor
    Hi
    bye bye
    def c'tor
    Hi
    bye bye
    def c'tor
    Hi
    bye bye

    Entering parallel region #2 (printing the value of the global instance(s) and adding the thread number)
    def c'tor
    0
    def c'tor
    0
    def c'tor
    0
    0

    Entering parallel region #3 (printing the global instance(s))
    0
    1
    2
    3

    About to leave main()
    bye bye
    bye bye

    If the copy constructor does a deep copy (which it should if you have to write your own, and does by default if you don't and have dynamically allocated data), then you get a deep copy of your object. This is as opposed to private which doesn't initialize the private copy with an existing object.

    threadprivate works totally differently. To start with, it's only for global or static variables. Even more critical, it's a directive in and of itself and supports no other clauses. You write the threadprivate pragma line somewhere and later the #pragma omp parallel before the parallel block. There are other differences (where in memory the object is stored, etc.) but that's a good start.

    Let's analyze the above output. First, note that on entering region #2 the default constructor is called creating a new global variable private to the thread. This is because on entering the first parallel region the parallel copy of the global variable doesn't yet exist.

    Next, as NoseKnowsAll considers the most crucial difference, the thread private global variables are persistent through different parallel regions. In region #3 there is no construction and we see that the added OMP thread number from region #2 is retained. Also note that no destructor is called in regions 2 and 3, but rather after leaving main() (and only one (master) copy for some reason - the other is inst. This may be a bug...).

    This brings us to why I used the Intel compiler. Visual Studio 2013 as well as g++ (4.6.2 on my computer, Coliru (g++ v5.2), codingground (g++ v4.9.2)) allow only POD types (source). This is listed as a bug for almost a decade and still hasn't been fully addressed. The Visual Studio error given is

    error C3057: 'globalClass' : dynamic initialization of 'threadprivate' symbols is not currently supported

    and the error given by g++ is

    error: 'globalClass' declared 'threadprivate' after first use

    The Intel compiler works with classes.

    One more note. If you want to copy the value of the master thread variable you can use #pragma omp parallel copyin(globalVarName). Note that this does not work with classes as in our example above (hence I left it out).

    Sources: OMP tutorial: private, firstprivate, threadprivate

    0 讨论(0)
提交回复
热议问题