问题
I have code, that I want to parallelize with cupy. I thought it would be straight forward - just write "import cupy as cp", and replace everywhere I wrote np., with cp., and it would work.
And, it does work, the code does run, but takes much slower. I thought it would eventually be faster, compared to numpy, when iterating through larger arrays, but it seems that never happens.
The code is:
q = np.zeros((5,5))
q[:,0] = 20
def foo(array):
result = array
shedding_row = array*0
for i in range((array.shape[0])):
for j in range((array.shape[1])-1):
shedding_param = 2 * (result[i,j])**.5
shedding = (np.random.poisson( (shedding_param), 1))[0]
if shedding >= result[i,j]:
shedding = result[i,j] - 1
result[i,j+1] = result[i,j] - shedding
if result[i,j+1]<0:
result[i,j+1] = 0
shedding_row[i,j+1] = shedding
return(result,shedding_row)
x,y = foo(q)
Is this supposed to get faster with cupy? Am I using it wrong?
回答1:
To get fast performance of numpy
or cupy
, you should use parallel operation instead of using for loop.
Just for example,
for i in range((array.shape[0])):
for j in range((array.shape[1])-1):
shedding_param = 2 * (result[i,j])**.5
This can be calculated as
xp = numpy # change to cupy for GPU
shedding_param = 2 * xp.sqrt(result[:, :-1])
来源:https://stackoverflow.com/questions/57386330/cupy-slower-than-numpy-when-iterating-through-array