I'm trying to implement something like this in CUDA:
for each element
p = { p if p >= floor
z if p < floor
Where floor
and z
are constants configured at the start of the test.
I have attempted to implement it like so, but I get the error "too many resources requested for launch"
A functor:
struct floor_functor : thrust::unary_function <float, float>
const float floorLevel, floorVal;
floor_functor(float _floorLevel, float _floorVal) : floorLevel(_floorLevel), floorVal(_floorVal){}
float operator()(float& x) const
if (x >= floorLevel)
return x;
return floorVal;
Used by a transform:
thrust::transform(input->begin(), input->end(), output.begin(), floor_functor(floorLevel, floorVal));
If I remove one of the members of my functor, say floorVal
, and use a functor with only one member variable, it works fine.
Does anyone know why this might be, and how I could fix it?
Additional info:
My array is 786432 elements long.
My GPU is a GeForce GTX590
I am building with the command:
`nvcc -c -g -arch sm_11 -Xcompiler -fPIC -Xcompiler -Wall -DTHRUST_DEBUG -I <my_include_dir> -o <my_output> <my_source>`
My cuda version is 4.0:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Thu_May_12_11:09:45_PDT_2011
Cuda compilation tools, release 4.0, V0.2.1221
And my maximum number of threads per block is 1024 (reported by deviceQuery):
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
I have stumbled upon a fix for my problem, but do not understand it. If I rename my functor from "floor_functor" to basically anything else, it works! I have no idea why this is the case, and would be interested to hear anyone's ideas about this.
For an easier CUDA implementation, you could do this with ArrayFire in one line of code:
p(p < floor) = z;
Just declare your variables as af::array's.
Good luck!
Disclaimer: I work on all sorts of CUDA projects, including ArrayFire.