Say I have packed some resources in an object, and then perform some computation based on the resources. What I normally do is to initialise the objects outside the parallel
When an object is declared firstprivate
, the copy constructor is called, whereas when private
is used the default constructor is called. We'll address threadprivate
below. Proof (Intel C++ 15.0):
#include <iostream>
#include <omp.h>
class myclass {
int _n;
public:
myclass(int n) : _n(n) { std::cout << "int c'tor\n"; }
myclass() : _n(0) { std::cout << "def c'tor\n"; }
myclass(const myclass & other) : _n(other._n)
{ std::cout << "copy c'tor\n"; }
~myclass() { std::cout << "bye bye\n"; }
void print() { std::cout << _n << "\n"; }
void add(int t) { _n += t; }
};
myclass globalClass;
#pragma omp threadprivate (globalClass)
int main(int argc, char* argv[])
{
std::cout << "\nBegninning main()\n";
myclass inst(17);
std::cout << "\nEntering parallel region #0 (using firstprivate)\n";
#pragma omp parallel firstprivate(inst)
{
std::cout << "Hi\n";
}
std::cout << "\nEntering parallel region #1 (using private)\n";
#pragma omp parallel private(inst)
{
std::cout << "Hi\n";
}
std::cout << "\nEntering parallel region #2 (printing the value of "
"the global instance(s) and adding the thread number)\n";
#pragma omp parallel
{
globalClass.print();
globalClass.add(omp_get_thread_num());
}
std::cout << "\nEntering parallel region #3 (printing the global instance(s))\n";
#pragma omp parallel
{
globalClass.print();
}
std::cout << "\nAbout to leave main()\n";
return 0;
}
gives
def c'tor
Begninning main()
int c'torEntering parallel region #0 (using firstprivate)
copy c'tor
Hi
bye bye
copy c'tor
Hi
bye bye
copy c'tor
Hi
bye bye
copy c'tor
Hi
bye byeEntering parallel region #1 (using private)
def c'tor
Hi
bye bye
def c'tor
Hi
bye bye
def c'tor
Hi
bye bye
def c'tor
Hi
bye byeEntering parallel region #2 (printing the value of the global instance(s) and adding the thread number)
def c'tor
0
def c'tor
0
def c'tor
0
0Entering parallel region #3 (printing the global instance(s))
0
1
2
3About to leave main()
bye bye
bye bye
If the copy constructor does a deep copy (which it should if you have to write your own, and does by default if you don't and have dynamically allocated data), then you get a deep copy of your object. This is as opposed to private
which doesn't initialize the private copy with an existing object.
threadprivate
works totally differently. To start with, it's only for global or static variables. Even more critical, it's a directive in and of itself and supports no other clauses. You write the threadprivate
pragma line somewhere and later the #pragma omp parallel
before the parallel block. There are other differences (where in memory the object is stored, etc.) but that's a good start.
Let's analyze the above output. First, note that on entering region #2 the default constructor is called creating a new global variable private to the thread. This is because on entering the first parallel region the parallel copy of the global variable doesn't yet exist.
Next, as NoseKnowsAll considers the most crucial difference, the thread private global variables are persistent through different parallel regions. In region #3 there is no construction and we see that the added OMP thread number from region #2 is retained. Also note that no destructor is called in regions 2 and 3, but rather after leaving main()
(and only one (master) copy for some reason - the other is inst
. This may be a bug...).
This brings us to why I used the Intel compiler. Visual Studio 2013 as well as g++ (4.6.2 on my computer, Coliru (g++ v5.2), codingground (g++ v4.9.2)) allow only POD types (source). This is listed as a bug for almost a decade and still hasn't been fully addressed. The Visual Studio error given is
error C3057: 'globalClass' : dynamic initialization of 'threadprivate' symbols is not currently supported
and the error given by g++ is
error: 'globalClass' declared 'threadprivate' after first use
The Intel compiler works with classes.
One more note. If you want to copy the value of the master thread variable you can use #pragma omp parallel copyin(globalVarName). Note that this does not work with classes as in our example above (hence I left it out).
Sources: OMP tutorial: private, firstprivate, threadprivate