问题
I have a multiprocessing program where I'm unable to work with global variables. I have a program which starts like this:-
from multiprocessing import Process ,Pool
print ("Initializing")
someList = []
...
...
...
Which means I have someList variables which get initialized before my main is called.
Later on in the code someList is set to some value and then I create 4 processes to process it
pool = Pool(4)
combinedResult = pool.map(processFn, someList)
pool.close()
pool.join()
Before spawning the processes, someList is set to a valid value.
However, when the processes are spawned, I see this print 4 times !!
Initializing
Initializing
Initializing
Initializing
As it is clear in each process the initialization section at the top of the program is getting called. Also, someList gets set to empty. If my understanding is correct, each process should be a replica of the current process's state which essentially means, I should have got 4 copies of the same list. Why are the globals being re-initialized again? And in fact, why is that section even being run?
Can someone please explain this to me? I referred to python docs but wasn't able to determine the root cause. They do recommend against using globals and I'm aware of it, but it still doesn't explain the call to the initialization function. Also, I'd like to use multiprocessing and not multithreading. I'm trying to understand how multiprocessing works here.
Thanks for your time.
回答1:
In Windows processes are not forked as in Linux/Unix. Instead they are spawned, which means that a new Python interpreter is started for each new multiprocessing.Process
. This means that all global variables are re-initialized and if you have somehow manipulated them along the way, this will not be seen by the spawned processes.
A solution to the problem is to pass the globals to the Pool
initilaizer
and then from there make it global
also in the spawned process:
from multiprocessing import Pool
def init_pool(the_list):
global some_list
some_list = the_list
def access_some_list(index):
return some_list[index]
if __name__ == "__main__":
some_list = [24, 12, 6, 3]
indexes = [3, 2, 1, 0]
pool = Pool(initializer=init_pool, initargs=(some_list,))
result = pool.map(access_some_list, indexes)
print(result)
In this setup, you will copy the globals to each new process and they will then be accessible, however, as always, any updates done from there on will not be propagated to any other process. For that you will need something like a proper multiprocessing.Manager
.
As an extra comment, from here it is clear that global variables can be dangerous, because it is hard to understand what values they will take in the different processes.
回答2:
I think the point is, that you are creating 4 Processes, which are executing the Code you give them. They work in the same instance, but executing the same Code.
So maybe, you do Multithreading or you use some if-clauses etc. to determine which Process should execute which Code.
- cheers
来源:https://stackoverflow.com/questions/49343907/does-multiprocess-in-python-re-initialize-globals