问题
I'm attempting to develop an AI to play a 1-player board game optimally. I'm using a depth-first search to a few levels.
I've attempted to speed it up by multithreading the initial loop iterating over all moves and recursing into the game trees. My idea is that each thread will split-up the initial possible move boards into chunks and further evaluate these in a separate recursive function. All functions called are nogil
However, I'm encountering what I can only guess is a race condition because the multi-threaded solution gives different results, and I'm not sure how to go about fixing it.
cdef struct Move:
int x
int y
int score
cdef Move search( board_t& board, int prevClears, int maxDepth, int depth ) nogil:
cdef Move bestMove
cdef Move recursiveMove
cdef vector[ Move ] moves = generateMoves( board )
cdef board_t nextBoard
cdef int i, clears
bestMove.score = 0
# Split the initial possible move boards amongst threads
for i in prange( <int> moves.size(), nogil = True ):
# Applies move and calculates the move score
nextBoard = applyMove( board, moves[ i ], prevClears, maxDepth, depth )
# Recursively evaluate further moves
if maxDepth - depth > 0:
clears = countClears( nextBoard )
recursiveMove = recursiveSearch( nextBoard, moves[ i ], clears, maxDepth, depth + 1 )
moves[ i ].score += recursiveMove.score
# Update bestMove
if moves[ i ].score > bestMove.score:
bestMove = moves[ i ]
return bestMove
回答1:
Cython does some magic, which depends on subtle things, when prange
is involved - so one really has to look at the resulting C code to understand what is going on.
As far as I can see your code, there are at least 2 problems.
1. Problem: bestMove
isn't initialized.
%%cython -+
cdef struct Move:
...
def foo()
cdef Move bestMove
return bestMove
would result in the following C-code:
...
struct __pyx_t_XXX_Move __pyx_v_bestMove;
...
__pyx_r = __pyx_convert__to_py_struct____pyx_t_XXX_Move(__pyx_v_bestMove); if ...
return __pyx_r;
The local variable __pyx_v_bestMove
will stay uninitialized (see e.g. this SO-post), even if it is well possible, that the initial value will consist only out of zeros.
Were bestMove
for example an int, Cython would give a warning, but it doesn't for structs.
2. Problem: assigning bestMove
leads to racing condition.
Btw, the result might not only be not the best move, but even an illegal move alltogether as it could be a combination (x
-,y
-,score
- values from different legal moves) of other assigned legal moves.
Here is a smaller reproducer of the issue:
%%cython -c=-fopenmp --link-args=-fopenmp
# cython
cimport cython
from cython.parallel import prange
cdef struct A:
double a
@cython.boundscheck(False)
def search_max(double[::1] vals):
cdef A max_val = [-1.0] # initialized!
cdef int i
cdef int n = len(vals)
for i in prange(n, nogil=True):
if(vals[i]>max_val.a):
max_val.a = vals[i]
return max_val.a
Were max_val
a cdef double
Cython wouldn't build it as it would try to make max_val
private (subtly magic). But now, max_val
is shared between threads (see resulting C-code) and the access to it should be guarded. If not we can see (one might need to run multiple times to trigger the race condition) the result:
>>> import numpy as np
>>> a = np.random.rand(1000)
>>> search_max(a)-search_max(a)
#0.0006253360398751351 but should be 0.0
What can be done? As @DavidW has proposed, we could collect maximum per thread and then find absolute maximum in a post process step - see this SO-post, which leads to:
%%cython -+ -c=-fopenmp --link-args=-fopenmp
cimport cython
from cython.parallel import prange, threadid
from libcpp.vector cimport vector
cimport openmp
cdef struct A:
double a
@cython.boundscheck(False)
def search_max(double[::1] vals):
cdef int i, tid
cdef int n = len(vals)
cdef vector[A] max_vals
# every thread gets its own max value:
NUM_THREADS = 4
max_vals.resize(NUM_THREADS, [-1.0])
for i in prange(n, nogil=True, num_threads = NUM_THREADS):
tid = threadid()
if(vals[i]>max_vals[tid].a):
max_vals[tid].a = vals[i]
#post process, collect results of threads:
cdef double res = -1.0
for i in range(NUM_THREADS):
if max_vals[i].a>res:
res = max_vals[i].a
return res
I think it is easier and less error prone to use openmp functionality with C/C++ and wrap the resulting code with Cython: Not only doesn't Cython support everything what openmp offers, but seeing problems in parallel code is hard enough when looking at simple C-code, without any implicit magic done by Cython.
来源:https://stackoverflow.com/questions/59005318/cython-parallelisation-race-condition-for-dfs