CUDA How to access constant memory in device kernel when the constant memory is declared in the host code?

前端 未结 2 568
陌清茗
陌清茗 2021-01-12 14:56

For the record this is homework so help as little or as much with that in mind. We are using constant memory to store a \"mask matrix\" that will be used to perform a convol

相关标签:
2条回答
  • 2021-01-12 15:20

    In "classic" CUDA compilation you must define all code and symbols (textures, constant memory, device functions) and any host API calls which access them (including kernel launches, binding to textures, copying to symbols) within the same translation unit. This means, effectively, in the same file (or via multiple include statements within the same file). This is because "classic" CUDA compilation doesn't include a device code linker.

    Since CUDA 5 was released, there is the possibility of using separate compilation mode and linking different device code objects into a single fatbinary payload on architectures which support it. In that case, you need to declare any __constant__ variables using the extern keyword and define the symbol exactly once.

    If you can't use separate compilation, then the usual workaround is to define the __constant__ symbol in the same .cu file as your kernel, and include a small host wrapper function which just calls cudaMemcpyToSymbol to set the __constant__ symbol in question. You would probably do the same with kernel calls and texture operations.

    0 讨论(0)
  • 2021-01-12 15:27

    Below is a "minimum-sized" example showing the use of __constant__ symbols. You do not need to pass any pointer to the __global__ function.

    #include <cuda.h>
    #include <cuda_runtime.h>
    #include <stdio.h>
    
    __constant__ float test_const;
    
    __global__ void test_kernel(float* d_test_array) {
        d_test_array[threadIdx.x] = test_const;
    }
    
    #include <conio.h>
    int main(int argc, char **argv) {
    
        float test = 3.f;
    
        int N = 16;
    
        float* test_array = (float*)malloc(N*sizeof(float)); 
    
        float* d_test_array;
        cudaMalloc((void**)&d_test_array,N*sizeof(float));
    
        cudaMemcpyToSymbol(test_const, &test, sizeof(float));
        test_kernel<<<1,N>>>(d_test_array);
    
        cudaMemcpy(test_array,d_test_array,N*sizeof(float),cudaMemcpyDeviceToHost);
    
        for (int i=0; i<N; i++) printf("%i %f\n",i,test_array[i]);
    
        getch();
        return 0;
    }
    
    0 讨论(0)
提交回复
热议问题