I\'m using a LRU cache to speed up some rather heavy duty processing. It works well and speeds things up considerably. However...
When I multiprocess, each process
I believe you can use a Manager
to share a dict between processes. That should in theory let you use the same cache for all functions.
However, I think a saner logic would be to have one process that responds to queries by looking them up in the cache, and if they are not present then delegating the work to a subprocess, and caching the result before returning it. You could easily do that with
with concurrent.futures.ProcessPoolExecutor() as e:
@functools.lru_cache
def work(*args, **kwargs):
return e.submit(slow_work, *args, **kwargs)
Note that work
will return Future
objects, which the consumer will have to wait on. The lru_cache
will cache the future objects so they will be returned automatically; I believe you can access their data more than once but can't test it right now.
If you're not using Python 3, you'll have to install backported versions of concurrent.futures
and functools.lru_cache
.
Pass the shared cache to each process. The parent process can instantiate a single cache and refer it to each process as an argument...
@utils.lru_cache(maxsize=300)
def get_stuff(key):
return Stuff(key)
def process(stuff_obj):
# get_stuff(key) <-- remove it from here
stuff_obj.execute()
def iterate_stuff(keys):
for key in keys:
yield get_stuff(key) # <-- and put it here
def main():
...
keys = get_list_of_keys()
for result in pool.imap(process, iterate_stuff(keys)):
evaluate(result)
...
Katriel's put me on the right track and I would implement that answer, but, silly me, this is actually easier even than that [for this application].