I would like to instantiate a class in CUDA code, that shares some of its members with other threads in the same block.
However, when trying to compile the following
Objects marked as __shared__
reside in shared memory that is dedicated per thread block. It has limited size and has the same lifetime as thread block.
So this is the reason why you cannot declare class members as shared - their lifetime is not managed by class instance, but by thread block. Possibly static
class members could be shared, but didn't check it.
See CUDA Programming Guide for details, section B.2.3.
Rost explained the rationale behind the limitation. To answer the second part of the question, a simple workaround is to have the kernel declare the shared memory, and initialize a pointer to it owned by the class, e.g. in the class constructor. Example.
class Foo
{
public:
__device__
Foo(int *sPtr) : sharedPointer(sPtr, gPtr) {
sharedPointer[threadIdx.x] = gPtr[blockIdx.x * blockDim.x + threadIdx.x];
__syncthreads();
}
__device__
void useSharedData() { printf("my data: %f\n", sharedPointer[threadIdx.x]); }
private:
int *sharedPointer;
};
__global__ void example(int *gData)
{
__shared__ int sData[BLOCKDIM];
Foo f(sData, gData);
f.useSharedData();
}
Caveat: code written in browser, unverified, untested (and it's a trivial example, but the concept extends to real code—I have used this technique myself).