colored image to greyscale image using CUDA parallel processing

前端未结

关注

 12  1457

I am trying to solve a problem in which i am supposed to change a colour image to a greyscale image. For this purpose i am using CUDA parallel approach.

The kerne code i

相关标签:

12条回答

醉话见心

2021-02-04 19:51

I recently joined this course and tried your solution but it don't work so, i tried my own. You are almost correct. The correct solution is this:

__global__`
void rgba_to_greyscale(const uchar4* const rgbaImage,
               unsigned char* const greyImage,
               int numRows, int numCols)
{`

int pos_x = (blockIdx.x * blockDim.x) + threadIdx.x;
int pos_y = (blockIdx.y * blockDim.y) + threadIdx.y;
if(pos_x >= numCols || pos_y >= numRows)
    return;

uchar4 rgba = rgbaImage[pos_x + pos_y * numCols];
greyImage[pos_x + pos_y * numCols] = (.299f * rgba.x + .587f * rgba.y + .114f * rgba.z); 

}

The rest is same as your code.

0 讨论(0)

眼角桃花

2021-02-04 19:51

1- int x =(blockIdx.x * blockDim.x) + threadIdx.x;

2- int y = (blockIdx.y * blockDim.y) + threadIdx.y;

And in grid and block size

1- const dim3 blockSize(32, 32, 1);

2- const dim3 gridSize((numCols/32+1), (numRows/32+1) , 1);

Code executed in 0.036992 ms.

0 讨论(0)
发布评论:

提交评论
- 加载中...

[愿得一人]

2021-02-04 19:53

__global__
void rgba_to_greyscale(const uchar4* const rgbaImage,
                       unsigned char* const greyImage,
                       int numRows, int numCols)
{
    int rgba_x = blockIdx.x * blockDim.x + threadIdx.x;
    int rgba_y = blockIdx.y * blockDim.y + threadIdx.y;
    int pixel_pos = rgba_x+rgba_y*numCols;

    uchar4 rgba = rgbaImage[pixel_pos];
    unsigned char gray = (unsigned char)(0.299f * rgba.x + 0.587f * rgba.y + 0.114f * rgba.z);
    greyImage[pixel_pos] = gray;
}

void your_rgba_to_greyscale(const uchar4 * const h_rgbaImage, uchar4 * const d_rgbaImage,
                            unsigned char* const d_greyImage, size_t numRows, size_t numCols)
{
    //You must fill in the correct sizes for the blockSize and gridSize
    //currently only one block with one thread is being launched
    const dim3 blockSize(24, 24, 1);  //TODO
    const dim3 gridSize( numCols/24+1, numRows/24+1, 1);  //TODO
    rgba_to_greyscale<<<gridSize, blockSize>>>(d_rgbaImage, d_greyImage, numRows, numCols);

    cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());
}

0 讨论(0)

天命终不由人

2021-02-04 19:55
Since you are not aware of the image size. It is best to choose any reasonable dimension of the two-dimensional block of threads and then check for two conditions. The first one is that the pos_x and pos_y indexes in the kernel do not exceed numRows and numCols. Secondly the grid size should be just above the total number of threads in all the blocks.
```
const dim3 blockSize(16, 16, 1);
const dim3 gridSize((numCols%16) ? numCols/16+1 : numCols/16,
(numRows%16) ? numRows/16+1 : numRows/16, 1);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2021-02-04 19:55
You are running following number of block and grids:
```
  const dim3 blockSize(numCols/32, numCols/32 , 1);  //TODO
  const dim3 gridSize(numRows/12, numRows/12 , 1);  //TODO
```
yet you are not using any threads in your kernel code!
```
 int absolute_image_position_x = blockIdx.x;  
 int absolute_image_position_y = blockIdx.y;
```
think this way, the width of an image can be divide into absolute_image_position_x parts of column and the height of an image can be divide into absolute_image_position_y parts of row. Now the box each of the cross section it creates you need to change/redraw all the pixels in terms of greyImage, parallely. Enough spoiler for an assignment :)
0 讨论(0)
发布评论:

提交评论
- 加载中...
借酒劲吻你

2021-02-04 20:06
Now, since I posted this question I have been continuously working on this problem
there are a couple of improvements that should be done in order to get this problem correct now I realize my initial solution was wrong .
Changes to be done:-
```
 1. absolute_position_x =(blockIdx.x * blockDim.x) + threadIdx.x;
 2. absolute_position_y = (blockIdx.y * blockDim.y) + threadIdx.y;
```
Secondly,
```
 1. const dim3 blockSize(24, 24, 1);
 2. const dim3 gridSize((numCols/16), (numRows/16) , 1);
```
In the solution we are using a grid of numCols/16 * numCols/16
and blocksize of 24 * 24

code executed in 0.040576 ms

@datenwolf : thanks for answering above!!!
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页