understanding this race condition in numba parallelization

问题

There is an example in Numba doc about parallel race condition

import numba as nb
import numpy as np
@nb.njit(parallel=True)
def prange_wrong_result(x):
    n = x.shape[0]
    y = np.zeros(4)
    for i in nb.prange(n):
        y[:]+= x[i]
    return y

I have ran it, and it indeed outputs abnormal result like

prange_wrong_result(np.ones(10000))
#array([5264., 5273., 5231., 5234.])

then I tried to change the loop into

import numba as nb
import numpy as np
@nb.njit(parallel=True)
def prange_wrong_result(x):
    n = x.shape[0]
    y = np.zeros(4)
    for i in nb.prange(n):
        y+= x[i]
    return y

and it outputs

prange_wrong_result(np.ones(10000))
#array([10000., 10000., 10000., 10000.])

I have read some race condition explanation. But I still don't understand

why second example has no racing condition? What is the difference between y[:]= vs y=
why output of four elements in first example is not the same?

回答1:

In your first example you have multiple threads/processes that share the same array and read + assign to the shared array. The y[:] += x[i] is roughly equivalent to:

y[0] += x[i]
y[1] += x[i]
y[2] += x[i]
y[3] += x[i]

In fact the += is just syntactic sugar for a read, addition, and assignment operation, so y[0] += x[i] is in fact:

_value = y[0]
_value = _value + x[i]
y[0] = _value

The loop body is executed simultaneously by multiple threads/processes and that's where the race-condition comes in. The example on Wikipedia on a race-condition applies here:

That's why the returned array contains wrong values and why each element might be different. Because it's simply non-deterministic which thread/process runs when. So in some cases there's a race-condition on one element, sometimes on none, sometimes on multiple elements.

However the numba developers have implemented some supported reductions where no race-condition occurs. One of them is y +=. The important thing here is that it's the variable itself, instead of a slice/element of the variable. In that case numba does something very clever. They copy the initial value of the variable for each thread/process and then operate on that copy. After the parallel loop finished they add up the copied values. Taking your second example and assuming if it used 2 processes it would look roughly like this:

y = np.zeros(4)
y_1 = y.copy()
y_2 = y.copy()
for i in nb.prange(n):
    if is_process_1:
        y_1[:] += x[i]
    if is_process_2:
        y_2[:] += x[i]
y += y_1
y += y_2

Since each thread has its own array there's no potential for a race-condition. For numba to be able to deduce this you have to follow their restrictions. The documentation states that numba creates race-condition-free parallel code for += on scalars and arrays (y += x[i]), but not on array elements/slices (y[:] += x[i] or y[1] += x[i]).

来源：https://stackoverflow.com/questions/59596794/understanding-this-race-condition-in-numba-parallelization

标签

python

parallel-processing

numba