问题
Here is a complete simple working example
import multiprocessing as mp
import time
import random
class Foo:
def __init__(self):
# some expensive set up function in the real code
self.x = 2
print('initializing')
def run(self, y):
time.sleep(random.random() / 10.)
return self.x + y
def f(y):
foo = Foo()
return foo.run(y)
def main():
pool = mp.Pool(4)
for result in pool.map(f, range(10)):
print(result)
pool.close()
pool.join()
if __name__ == '__main__':
main()
How can I modify it so Foo is only initialized once by each worker, not every task? Basically I want the init called 4 times, not 10. I am using python 3.5
回答1:
The intended way to deal with things like this is via the optional initializer
and initargs
arguments to the Pool()
constructor. They exist precisely to give you a way to do stuff exactly once when a worker process is created. So, e.g., add:
def init():
global foo
foo = Foo()
and change the Pool
creation to:
pool = mp.Pool(4, initializer=init)
If you needed to pass arguments to your per-process initialization function, then you'd also add an appropriate initargs=...
argument.
Note: of course you should also remove the
foo = Foo()
line from f()
, so that your function uses the global foo
created by init()
.
回答2:
most obvious, lazy load
_foo = None
def f(y):
global _foo
if not _foo:
_foo = Foo()
return _foo.run(y)
来源:https://stackoverflow.com/questions/38795826/optimizing-multiprocessing-pool-with-expensive-initialization