Iterating through a 2D array in PyCUDA

后端 未结 1 432
清酒与你
清酒与你 2021-01-29 12:03

I am trying to iterate through a 2D array in PyCUDA but I end up with repeated array values. I initially throw a small random integer array and that works as expected but when I

相关标签:
1条回答
  • 2021-01-29 12:08

    The problem here is that the image you are loading doesn't have pixel values stored as signed integers. This modification of your example works more as expected:

    import pycuda.driver as cuda
    from pycuda.compiler import SourceModule
    import numpy as np
    import cv2 
    
    import pycuda.autoinit
    
    img = cv2.imread('Chest.jpg',0)
    img_size=img.shape
    print img_size
    print img.dtype
    
    #nbtes determines the number of bytes for the numpy array a
    img_gpu = cuda.mem_alloc(img.nbytes)
    #Copies the memory from CPU to GPU
    cuda.memcpy_htod(img_gpu, img)
    
    mod = SourceModule("""
    #include <stdio.h>
    __global__ void AHE(unsigned char *a, int row, int col)
    {
    int i = threadIdx.x+ blockIdx.x* blockDim.x;
    int j = threadIdx.y+ blockIdx.y* blockDim.y;
    if(i==0 && j ==0)
    printf("Output array ");
    if(i <row && j < col)
    {
        int val = int(a[j + i*col]);
        printf(" %d", val);
    }
    }
    """)
    #Gives you the number of columns
    col = np.int32(img.shape[-1])
    row = np.int32(img.shape[0])
    func = mod.get_function("AHE")
    func(img_gpu, row, col, block=(32,32,1))
    img_ahe = np.empty_like(img)
    cuda.memcpy_dtoh(img_ahe, img_gpu)
    

    When run the code emits this:

    $ python image.py 
    (681, 1024)
    uint8
    Output array  244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 245 245 245 246 246 246 246 246 246 246 246 246 246 246 244 244 244 244 244 244 244 244 245 245 245 245 245 245 245 245 244 244 245 245 245 246 246 246 
    

    [Output clipped for brevity]

    Note the dtype of the image - uint8. Your code is attempting to treat the stream of unsigned 8 bit values as integers. It should technically generate a runtime error on a full image because the kernel will read beyond the size of image as it reads 4 bytes per pixel instead of 1. However, you don't see this because you only run a single block, and your input image is presumably at least four times larger than the 32 x 32 size of the block you run.

    Incidentally, PyCUDA is extremely good at managing and enforcing type safety for CUDA calls, but your code neatly defeats every mechanism by which PyCUDA could detect a type mismatch in the kernel call. PyCUDA includes an excellent GPUarray class. You should familiarise yourself with it. If you had used a GPUarray instance here, you would have gotten type mismatch runtime errors which would have alerted you to the exact source of the problem the first time you tried to run it.

    0 讨论(0)
提交回复
热议问题