Find max/min in CUDA without passing it to the CPU

前端 未结 2 1214
一向
一向 2021-01-24 08:07

I need to find the index of the maximum element in an array of floats. I am using the function \"cublasIsamax\", but this returns the index to the CPU, and this is slowing down

相关标签:
2条回答
  • 2021-01-24 08:40

    If you want to use CUBLAS and you have a GPU with compute capability 3.5 (K20, Titan) than you can use CUBLAS with dynamic parallelism. Than you can call CUBLAS from within a kernel on the GPU and no data will be returned to the CPU. If you have no device with cc 3.5 you will probably have to implement a find max function by yourself or look for an aditional library.

    0 讨论(0)
  • 2021-01-24 09:06

    Since the CUBLAS V2 API was introduced (with CUDA 4.0, IIRC), it is possible to have routines which return a scalar or index to store those directly into a variable in device memory, rather than into a host variable (which entails a device to host transfer and might leave the result in the wrong memory space).

    To use this, you need to use the cublasSetPointerMode call to tell the CUBLAS context to expect pointers for scalar arguments to be device pointers by using the CUBLAS_POINTER_MODE_DEVICE mode. This then implies that in a call like

    cublasStatus_t cublasIsamax(cublasHandle_t handle, int n,
                                const float *x, int incx, int *result)
    

    that result must be a device pointer.

    0 讨论(0)
提交回复
热议问题