have been fiddling with Python\'s multicore
function for upwards of an hour now, trying to parallelize a rather complex graph traversal function using Pro
Too much piling up here to address in comments, so, where mp
is multiprocessing
:
mp.cpu_count()
should return the number of processors. But test it. Some platforms are funky, and this info isn't always easy to get. Python does the best it can.
If you start 24 processes, they'll do exactly what you tell them to do ;-) Looks like mp.Pool()
would be most convenient for you. You pass the number of processes you want to create to its constructor. mp.Pool(processes=None)
will use mp.cpu_count()
for the number of processors.
Then you can use, for example, .imap_unordered(...)
on your Pool
instance to spread your degreelist
across processes. Or maybe some other Pool
method would work better for you - experiment.
If you can't bash the problem into Pool
's view of the world, you could instead create an mp.Queue
to create a work queue, .put()
'ing nodes (or slices of nodes, to reduce overhead) to work on in the main program, and write the workers to .get()
work items off that queue. Ask if you need examples. Note that you need to put sentinel values (one per process) on the queue, after all the "real" work items, so that worker processes can test for the sentinel to know when they're done.
FYI, I like queues because they're more explicit. Many others like Pool
s better because they're more magical ;-)
Here's an executable prototype for you. This shows one way to use imap_unordered
with Pool
and chunksize
that doesn't require changing any function signatures. Of course you'll have to plug in your real code ;-) Note that the init_worker
approach allows passing "most of" the arguments only once per processor, not once for every item in your degreeslist
. Cutting the amount of inter-process communication can be crucial for speed.
import multiprocessing as mp
def init_worker(mps, fps, cut):
global memorizedPaths, filepaths, cutoff
global DG
print "process initializing", mp.current_process()
memorizedPaths, filepaths, cutoff = mps, fps, cut
DG = 1##nx.read_gml("KeggComplete.gml", relabel = True)
def work(item):
_all_simple_paths_graph(DG, cutoff, item, memorizedPaths, filepaths)
def _all_simple_paths_graph(DG, cutoff, item, memorizedPaths, filepaths):
pass # print "doing " + str(item)
if __name__ == "__main__":
m = mp.Manager()
memorizedPaths = m.dict()
filepaths = m.dict()
cutoff = 1 ##
# use all available CPUs
p = mp.Pool(initializer=init_worker, initargs=(memorizedPaths,
filepaths,
cutoff))
degreelist = range(100000) ##
for _ in p.imap_unordered(work, degreelist, chunksize=500):
pass
p.close()
p.join()
I strongly advise running this exactly as-is, so you can see that it's blazing fast. Then add things to it a bit a time, to see how that affects the time. For example, just adding
memorizedPaths[item] = item
to _all_simple_paths_graph()
slows it down enormously. Why? Because the dict gets bigger and bigger with each addition, and this process-safe dict has to be synchronized (under the covers) among all the processes. The unit of synchronization is "the entire dict" - there's no internal structure the mp machinery can exploit to do incremental updates to the shared dict.
If you can't afford this expense, then you can't use a Manager.dict()
for this. Opportunities for cleverness abound ;-)