What's the 'right' way to implement a 32-bit memset for CUDA?

前端 未结 2 953
星月不相逢
星月不相逢 2021-01-24 04:04

CUDA has the API call

cudaError_t cudaMemset (void *devPtr, int value, size_t count)

which fills a buffer with a single-byte value. I want to f

相关标签:
2条回答
  • 2021-01-24 04:18

    As of about CUDA 3.0, runtime API device pointers (and everything else) are interoperable with the driver API. So yes, you can use cuMemsetD32 to fill a runtime API allocation with a 32 bit value. The size of CUdeviceptr will match the size of void *on you platform and it is safe to cast a pointer from the CUDA API to CUdeviceptr or vice versa.

    0 讨论(0)
  • 2021-01-24 04:29

    Based on talonmies' answer, it seems a reasonable (though ugly) approach would be:

    #include <stdint.h>
    inline cudaError_t cudaMemsetTyped<T>(void *devPtr, T value, size_t count);
    
    #define INSTANTIATE_CUDA_MEMSET_TYPED(_nbits) \
    inline cudaError_t cudaMemsetTyped<int ## _nbits ## _t>(void *devPtr, int ## _nbits ## _t value, size_t count) { \
        cuMemsetD ## _nbits( reinterpret_cast<CUdeviceptr>(devPtr), value, count); \
    } \
    inline cudaError_t cudaMemsetTyped<uint ## _nbits ## _t>(void *devPtr, uint ## _nbits ## _t value, size_t count) { \
        cuMemsetD ## _nbits( reinterpret_cast<CUdeviceptr>(devPtr), reinterpret_cast<uint ## _nbits ## _t>(value), count); \
    } \
    
    INSTANTIATE_CUDA_MEMSET_TYPED(8)
    INSTANTIATE_CUDA_MEMSET_TYPED(16)
    INSTANTIATE_CUD_AMEMSET_TYPED(32)
    
    #undef INSTANTIATE_CUDA_MEMSET_TYPED(_nbits)
    
    inline cudaError_t cudaMemsetTyped<float>(void *devPtr, float value, size_t count) {
        cuMemsetD32( reinterpret_cast<CUdeviceptr>(devPtr), reinterpret_cast<int>(value), count);
    }
    

    (no cuMemset64 it seems, so no double either)

    0 讨论(0)
提交回复
热议问题