CUDA's Mersenne Twister for an arbitrary number of threads
问题 CUDA's implementation of the Mersenne Twister ( MT ) random number generator is limited to a maximal number of threads/blocks of 256 and 200 blocks/grid, i.e. the maximal number of threads is 51200 . Therefore, it is not possible to launch the kernel that uses the MT with kernel<<<blocksPerGrid, threadsPerBlock>>>(devMTGPStates, ...) where int blocksPerGrid = (n+threadsPerBlock-1)/threadsPerBlock; and n is the total number of threads. What is the best way to use the MT for threads > 51200 ?