问题
I wrote a script that I deploy in an HPC node with 112 cores, thus starting 112 processes up to completing 400 needed (node_combinations
is a list of 400 tuples). The relevant snippet of code is below:
# Parallel Path Probability Calculation
# =====================================
node_combinations = [(i, j) for i in g.nodes for j in g.nodes]
pool = Pool()
start = datetime.datetime.now()
logging.info("Start time: %s", start)
print("Start time: ", start)
pool.starmap(g._print_probability_path_ij, node_combinations)
end = datetime.datetime.now()
print("End time: ", end)
print("Run time: ", end - start)
logging.info("End time: %s", end)
logging.info("Total run time: %s", start)
pool.close()
pool.join()
I follow the performance by running htop
and observed the following. Initially all 112 cores are working at 100%. Eventually, since some processes are shorter than others, I am left with a smaller number of cores working at 100%. Eventually, all processes are shown as sleeping.
I believe the problem is that some of these processes (the ones that take longer, about 20 out of 400) require a lot of memory. When memory runs short, the processes go to sleep and since memory is never freed, they remain there, sleeping. These are my questions:
Once a process finishes, are the resources (read memory) freed or do they remain occupied until all processes finish? In other words, once I have only 20 cores working (because the others already processed all the shorter processes) do they have access do all the memory or only the not used by the rest of the processes?
I've read that
maxtasksperchild
may help in this situation. How would that work? How can I determine what is the appropriate number of tasks for each child?
If you wonder why I am asking this, it's because in the documentation I read this: New in version 2.7: maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.
回答1:
You should leave at least one core available to the core OS and one available to the initiating script; try reducing your pool size. e.g. Pool(110)
Use Pool.imap (or imap_unordered) instead of Pool.map. This will iterate over data lazily than loading all of it in memory before starting processing.
Set a value to maxtasksperchild parameter.
When you use multiprocessing Pool, child processes will be created using the fork() system call. Each of those processes starts with a copy of the memory of the parent process at that time. Because you're loading the list of tuples before you create the Pool the processes in the pool will have a copy of the data.
The answer here walks through a method of memory profiling so you can see where your memory is going, when.
来源:https://stackoverflow.com/questions/59406622/multiprocessing-pool-memory-usage