Python multiprocessing memory usage

后端 未结 1 1868
庸人自扰
庸人自扰 2020-12-03 14:31

I have writen a program that can be summarized as follows:

def loadHugeData():
    #load it
    return data

def processHugeData(data, res_queue):
    for it         


        
相关标签:
1条回答
  • 2020-12-03 14:45

    The multiprocessing module is effectively based on the fork system call which creates a copy of the current process. Since you are loading the huge data before you fork (or create the multiprocessing.Process), the child process inherits a copy of the data.

    However, if the operating system you are running on implements COW (copy-on-write), there will only actually be one copy of the data in physical memory unless you modify the data in either the parent or child process (both parent and child will share the same physical memory pages, albeit in different virtual address spaces); and even then, additional memory will only be allocated for the changes (in pagesize increments).

    You can avoid this situation by calling multiprocessing.Process before you load your huge data. Then the additional memory allocations will not be reflected in the child process when you load the data in the parent.

    0 讨论(0)
提交回复
热议问题