i am a new guy here, and ask for help about some questions about cuda.
as i know, my device limits the max numbers of threads per block is 1024 and when i try to parallel