Consider the following snippet:
#include
Anything threadprivate will be replicated for each thread. I have done this by making a static object (class does not need to be static, just the instantiated object must be static). Maybe this is what you want?
Now consider if you want some members of the class to be shared between threads. Making only some members of the class static implies that if each thread instantiated that object, then we should replicate only the static part (because it's threadprivate) but not the entire object (shared memory is not replicated). This would require one object to have everything and all the other objects to be of smaller size (not re-storing shared memory) but still have a reference to the shared memory, which quite frankly doesn't make sense.
As a suggestion, make yourself two classes, one with strictly (thread)private data and one for shared data.
The incomplete type error is a bug in the compiler which can be worked around by instantiating std::map<int,int>
before the threadprivate directive. But once you get past that issue GCC 4.7 still doesn't support dynamic initialization of threadprivate variables. This will be supported in GCC 4.8.
This is a compiler restriction. Intel C/C++ compiler supports C++ classes on threadprivate
while gcc and MSVC currently cannot.
For example, in MSVC (VS 2010), you will get this error (I removed the class):
static std::map<int,int> theMap;
#pragma omp threadprivate(theMap)
error C3057: 'theMap' : dynamic initialization of 'threadprivate' symbols is not currently supported
So, the workaround is pretty obvious, but dirty. You need to make a very simple thread-local storage. A simple approach would be:
const static int MAX_THREAD = 64;
struct MY_TLS_ITEM
{
std::map<int,int> theMap;
char padding[64 - sizeof(theMap)];
};
__declspec(align(64)) MY_TLS_ITEM tls[MAX_THREAD];
Note that the reason why I have padding is to avoid false sharing. I assume that 64-byte cache line for modern Intel x86 processors. __declspec(align(64))
is a MSVC extension that the structure is on the boundary of 64. So, any elements in tls
will be located on a different cache line, resulting in no false sharing. GCC has __attribute__ ((aligned(64)))
.
In order to access this simple TLS, you can do this:
tls[omp_get_thread_num()].theMap;
Of course, you should call this inside one of OpenMP parallel constructs. The nice thing is that OpenMP provides an abstracted thread ID in [0, N), where N is the maximum thread number. This enables a fast and simple TLS implementation. In general, a native TID from operating system is an arbitrary integer number. So, you mostly need to have a hash table whose access time is longer than a simple array.