问题
There is an example in Numba doc about parallel race condition
import numba as nb
import numpy as np
@nb.njit(parallel=True)
def prange_wrong_result(x):
n = x.shape[0]
y = np.zeros(4)
for i in nb.prange(n):
y[:]+= x[i]
return y
I have ran it, and it indeed outputs abnormal result like
prange_wrong_result(np.ones(10000))
#array([5264., 5273., 5231., 5234.])
then I tried to change the loop into
import numba as nb
import numpy as np
@nb.njit(parallel=True)
def prange_wrong_result(x):
n = x.shape[0]
y = np.zeros(4)
for i in nb.prange(n):
y+= x[i]
return y
and it outputs
prange_wrong_result(np.ones(10000))
#array([10000., 10000., 10000., 10000.])
I have read some race condition explanation. But I still don't understand
- why second example has no racing condition? What is the difference between
y[:]=
vsy=
- why output of four elements in first example is not the same?
回答1:
In your first example you have multiple threads/processes that share the same array and read + assign to the shared array. The y[:] += x[i]
is roughly equivalent to:
y[0] += x[i]
y[1] += x[i]
y[2] += x[i]
y[3] += x[i]
In fact the +=
is just syntactic sugar for a read, addition, and assignment operation, so y[0] += x[i]
is in fact:
_value = y[0]
_value = _value + x[i]
y[0] = _value
The loop body is executed simultaneously by multiple threads/processes and that's where the race-condition comes in. The example on Wikipedia on a race-condition applies here:
That's why the returned array contains wrong values and why each element might be different. Because it's simply non-deterministic which thread/process runs when. So in some cases there's a race-condition on one element, sometimes on none, sometimes on multiple elements.
However the numba developers have implemented some supported reductions where no race-condition occurs. One of them is y +=
. The important thing here is that it's the variable itself, instead of a slice/element of the variable. In that case numba does something very clever. They copy the initial value of the variable for each thread/process and then operate on that copy. After the parallel loop finished they add up the copied values. Taking your second example and assuming if it used 2 processes it would look roughly like this:
y = np.zeros(4)
y_1 = y.copy()
y_2 = y.copy()
for i in nb.prange(n):
if is_process_1:
y_1[:] += x[i]
if is_process_2:
y_2[:] += x[i]
y += y_1
y += y_2
Since each thread has its own array there's no potential for a race-condition. For numba to be able to deduce this you have to follow their restrictions. The documentation states that numba creates race-condition-free parallel code for +=
on scalars and arrays (y += x[i]
), but not on array elements/slices (y[:] += x[i]
or y[1] += x[i]
).
来源:https://stackoverflow.com/questions/59596794/understanding-this-race-condition-in-numba-parallelization