I am trying to iterate through a 2D array in PyCUDA but I end up with repeated array values. I initially throw a small random integer array and that works as expected but when I
The problem here is that the image you are loading doesn't have pixel values stored as signed integers. This modification of your example works more as expected:
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import numpy as np
import cv2
import pycuda.autoinit
img = cv2.imread('Chest.jpg',0)
img_size=img.shape
print img_size
print img.dtype
#nbtes determines the number of bytes for the numpy array a
img_gpu = cuda.mem_alloc(img.nbytes)
#Copies the memory from CPU to GPU
cuda.memcpy_htod(img_gpu, img)
mod = SourceModule("""
#include <stdio.h>
__global__ void AHE(unsigned char *a, int row, int col)
{
int i = threadIdx.x+ blockIdx.x* blockDim.x;
int j = threadIdx.y+ blockIdx.y* blockDim.y;
if(i==0 && j ==0)
printf("Output array ");
if(i <row && j < col)
{
int val = int(a[j + i*col]);
printf(" %d", val);
}
}
""")
#Gives you the number of columns
col = np.int32(img.shape[-1])
row = np.int32(img.shape[0])
func = mod.get_function("AHE")
func(img_gpu, row, col, block=(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe, img_gpu)
When run the code emits this:
$ python image.py
(681, 1024)
uint8
Output array 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 245 245 245 246 246 246 246 246 246 246 246 246 246 246 244 244 244 244 244 244 244 244 245 245 245 245 245 245 245 245 244 244 245 245 245 246 246 246
[Output clipped for brevity]
Note the dtype
of the image - uint8
. Your code is attempting to treat the stream of unsigned 8 bit values as integers. It should technically generate a runtime error on a full image because the kernel will read beyond the size of image as it reads 4 bytes per pixel instead of 1. However, you don't see this because you only run a single block, and your input image is presumably at least four times larger than the 32 x 32 size of the block you run.
Incidentally, PyCUDA is extremely good at managing and enforcing type safety for CUDA calls, but your code neatly defeats every mechanism by which PyCUDA could detect a type mismatch in the kernel call. PyCUDA includes an excellent GPUarray class. You should familiarise yourself with it. If you had used a GPUarray instance here, you would have gotten type mismatch runtime errors which would have alerted you to the exact source of the problem the first time you tried to run it.