I would like to instantiate a class in CUDA code, that shares some of its members with other threads in the same block.
However, when trying to compile the following
Rost explained the rationale behind the limitation. To answer the second part of the question, a simple workaround is to have the kernel declare the shared memory, and initialize a pointer to it owned by the class, e.g. in the class constructor. Example.
class Foo
{
public:
__device__
Foo(int *sPtr) : sharedPointer(sPtr, gPtr) {
sharedPointer[threadIdx.x] = gPtr[blockIdx.x * blockDim.x + threadIdx.x];
__syncthreads();
}
__device__
void useSharedData() { printf("my data: %f\n", sharedPointer[threadIdx.x]); }
private:
int *sharedPointer;
};
__global__ void example(int *gData)
{
__shared__ int sData[BLOCKDIM];
Foo f(sData, gData);
f.useSharedData();
}
Caveat: code written in browser, unverified, untested (and it's a trivial example, but the concept extends to real code—I have used this technique myself).