colored image to greyscale image using CUDA parallel processing

前端 未结 12 1252
失恋的感觉
失恋的感觉 2021-02-04 19:10

I am trying to solve a problem in which i am supposed to change a colour image to a greyscale image. For this purpose i am using CUDA parallel approach.

The kerne code i

相关标签:
12条回答
  • 2021-02-04 19:51

    I recently joined this course and tried your solution but it don't work so, i tried my own. You are almost correct. The correct solution is this:

    __global__`
    void rgba_to_greyscale(const uchar4* const rgbaImage,
                   unsigned char* const greyImage,
                   int numRows, int numCols)
    {`
    
    int pos_x = (blockIdx.x * blockDim.x) + threadIdx.x;
    int pos_y = (blockIdx.y * blockDim.y) + threadIdx.y;
    if(pos_x >= numCols || pos_y >= numRows)
        return;
    
    uchar4 rgba = rgbaImage[pos_x + pos_y * numCols];
    greyImage[pos_x + pos_y * numCols] = (.299f * rgba.x + .587f * rgba.y + .114f * rgba.z); 
    
    }
    

    The rest is same as your code.

    0 讨论(0)
  • 2021-02-04 19:51

    1- int x =(blockIdx.x * blockDim.x) + threadIdx.x;

    2- int y = (blockIdx.y * blockDim.y) + threadIdx.y;

    And in grid and block size

    1- const dim3 blockSize(32, 32, 1);

    2- const dim3 gridSize((numCols/32+1), (numRows/32+1) , 1);

    Code executed in 0.036992 ms.

    0 讨论(0)
  • 2021-02-04 19:53
    __global__
    void rgba_to_greyscale(const uchar4* const rgbaImage,
                           unsigned char* const greyImage,
                           int numRows, int numCols)
    {
        int rgba_x = blockIdx.x * blockDim.x + threadIdx.x;
        int rgba_y = blockIdx.y * blockDim.y + threadIdx.y;
        int pixel_pos = rgba_x+rgba_y*numCols;
    
        uchar4 rgba = rgbaImage[pixel_pos];
        unsigned char gray = (unsigned char)(0.299f * rgba.x + 0.587f * rgba.y + 0.114f * rgba.z);
        greyImage[pixel_pos] = gray;
    }
    
    void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage,
                                unsigned char* const d_greyImage, size_t numRows, size_t numCols)
    {
        //You must fill in the correct sizes for the blockSize and gridSize
        //currently only one block with one thread is being launched
        const dim3 blockSize(24, 24, 1);  //TODO
        const dim3 gridSize( numCols/24+1, numRows/24+1, 1);  //TODO
        rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols);
    
        cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
    }
    
    0 讨论(0)
  • 2021-02-04 19:55

    Since you are not aware of the image size. It is best to choose any reasonable dimension of the two-dimensional block of threads and then check for two conditions. The first one is that the pos_x and pos_y indexes in the kernel do not exceed numRows and numCols. Secondly the grid size should be just above the total number of threads in all the blocks.

    const dim3 blockSize(16, 16, 1);
    const dim3 gridSize((numCols%16) ? numCols/16+1 : numCols/16,
    (numRows%16) ? numRows/16+1 : numRows/16, 1);
    
    0 讨论(0)
  • 2021-02-04 19:55

    You are running following number of block and grids:

      const dim3 blockSize(numCols/32, numCols/32 , 1);  //TODO
      const dim3 gridSize(numRows/12, numRows/12 , 1);  //TODO
    

    yet you are not using any threads in your kernel code!

     int absolute_image_position_x = blockIdx.x;  
     int absolute_image_position_y = blockIdx.y;
    

    think this way, the width of an image can be divide into absolute_image_position_x parts of column and the height of an image can be divide into absolute_image_position_y parts of row. Now the box each of the cross section it creates you need to change/redraw all the pixels in terms of greyImage, parallely. Enough spoiler for an assignment :)

    0 讨论(0)
  • 2021-02-04 20:06

    Now, since I posted this question I have been continuously working on this problem
    there are a couple of improvements that should be done in order to get this problem correct now I realize my initial solution was wrong .
    Changes to be done:-

     1. absolute_position_x =(blockIdx.x * blockDim.x) + threadIdx.x;
     2. absolute_position_y = (blockIdx.y * blockDim.y) + threadIdx.y;
    

    Secondly,

     1. const dim3 blockSize(24, 24, 1);
     2. const dim3 gridSize((numCols/16), (numRows/16) , 1);
    

    In the solution we are using a grid of numCols/16 * numCols/16
    and blocksize of 24 * 24

    code executed in 0.040576 ms

    @datenwolf : thanks for answering above!!!

    0 讨论(0)
提交回复
热议问题