I have writen a program that can be summarized as follows:
def loadHugeData():
#load it
return data
def processHugeData(data, res_queue):
for it
The multiprocessing
module is effectively based on the fork
system call which creates a copy of the current process. Since you are loading the huge data before you fork
(or create the multiprocessing.Process
), the child process inherits a copy of the data.
However, if the operating system you are running on implements COW (copy-on-write), there will only actually be one copy of the data in physical memory unless you modify the data in either the parent or child process (both parent and child will share the same physical memory pages, albeit in different virtual address spaces); and even then, additional memory will only be allocated for the changes (in pagesize
increments).
You can avoid this situation by calling multiprocessing.Process
before you load your huge data. Then the additional memory allocations will not be reflected in the child process when you load the data in the parent.