问题
Is there a way to share a huge dictionary to multiprocessing Subprocesses on windows without duplicating the whole memory? I only need it read-only within the sub-processes, if that helps.
My programm roughly looks like this:
def workerFunc(args):
id, data_mp, some_more_args = args
# Do some logic
# Parse some files on the disk
# and access some random keys from data_mp which are only known after parsing those files on disk ...
some_keys = [some_random_ids...]
# Do something with
do_something = [data_mp[x] for x in some_keys]
return do_something
if __name__ == "__main__":
multiprocessing.freeze_support() # Using this script as a PyInstalled .exe later on ...
DATA = readpickle('my_pickle.pkl') # my_pickle.pkl is huge, ~1GB
# DATA looks like this:
# {1: ['some text', SOME_1D_OR_2D_LIST...[[1,2,3], [123...]]],
# 2: ...,
# 3: ..., ...,
# 1 million keys... }
# Here I'm doing something with DATA in the main programm...
# Then I want to spawn N multiprocessing subprocesses, each doing some logic and than accessing a few keys of DATA to read from ...
manager = multiprocessing.Manager()
data_mp = manager.dict(DATA) # Right now I'm putting DATA into the shared memory... so it effectively duplicates the required memory...
joblist = []
for idx in range(10000): # Generate the workers, pass the shared memory link data_mp to each worker later on ...
joblist.append((idx, data_mp, some_more_args))
# Start Pool of Procs...
p = multiprocessing.Pool()
returnNodes = []
for ret in p.imap_unordered(workerFunc, jobList):
returnNodes.append(ret)
# Do some after work with DATA and returnNodes...
# and generate some overview xls-file out of it
Unfortunately there's no other way to save my big dictionary... I know a SQL Database would be better because each worker only accesses a few keys of DATA_mp within his subproc, but I don't know in advance which keys will be adressed by each worker.
So my question is: Is there any other way on windows to do this instead of using a Manager.dict() which, as stated above already, effectively duplicates the required memory?
Thanks!
EDIT Unfortunately in my corporate environment, there's no possibility for my tool to use a SQL DB because there's no dedicated machine available. I can only work on file-basis on networkdrives. I tried SQLite already, but it was seriously slow (even though I didnt understand why...). Yes it's a simple key->value kind of dictionary in DATA...
And using Python 2.7!
来源:https://stackoverflow.com/questions/60435574/python-multiprocessing-on-windows-shared-readonly-memory