问题
I am trying to use Numba and access the GPU in order to accelerate the code, but I get the following error:
in jit raise NotImplementedError("bounds checking is not supported for CUDA")
NotImplementedError: bounds checking is not supported for CUDA
I saw that another question was raised, but not completely specified nor answered here.
I implemented the 2-for loops when I saw that the vectorized code (y = corr*x + np.sqrt(1.-corr**2)*z
) did not work (same error). I also tried to play around with the option boundscheck
, but this did not change the outcome.
The error did not appear when not specifying the target
, since it goes on the CPU automatically (I guess).
import numpy as np
from numba import jit
N = int(1e8)
@jit(nopython=True, target='cuda', boundscheck=False)
def Brownian_motions(T, N, corr):
x = np.random.normal(0, 1, size=(T,N))
z = np.random.normal(0, 1, size=(T,N))
y = np.zeros(shape=(T,N))
for i in range(T):
for j in range(N):
y[i,j] = corr*x[i,j] + np.sqrt(1.-corr**2)*z[i,j]
return(x,y)
x, y = Brownian_motions(T = 500, N = N, corr = -0.45)
Could you please help me? Python is 3.7.6 and Numba is 0.48.0.
回答1:
In my case I also replaced with @jit which is decorator to compile the multiple operations using XLA. Here is an example code to see the performance of CPU and GPU.
from numba import jit
import numpy as np
# to measure exec time
from timeit import default_timer as timer
# normal function to run on cpu
def func(a):
for i in range(10000000):
a[i]+= 1
# function optimized to run on gpu
@jit
#(target ="cuda")
def func2(a):
for i in range(10000000):
a[i]+= 1
if __name__=="__main__":
n = 10000000
a = np.ones(n, dtype = np.float64)
b = np.ones(n, dtype = np.float32)
start = timer()
func(a)
print("without GPU:", timer()-start)
start = timer()
func2(a)
print("with GPU:", timer()-start)
Result: without GPU: 5.353004818000045 with GPU: 0.23115529000006063
回答2:
Replace @jit(nopython=True, target='cuda', boundscheck=False) with @jit
import numpy as np
from numba import jit
N = int(1e8)
@jit
def Brownian_motions(T, N, corr):
x = np.random.normal(0, 1, size=(T,N))
z = np.random.normal(0, 1, size=(T,N))
y = np.zeros(shape=(T,N))
for i in range(T):
for j in range(N):
y[i,j] = corr*x[i,j] + np.sqrt(1.-corr**2)*z[i,j]
return(x,y)
x, y = Brownian_motions(T = 500, N = N, corr = -0.45)
来源:https://stackoverflow.com/questions/60117150/bounds-checking-is-not-supported-for-cuda