Reading data in parallel with multiprocess

谁都会走 提交于 2020-01-05 05:52:08

问题


Can this be done?

What i have in mind is the following:

i ll have a dict, and each child process will add a new key:value combination to the dict.

Can this be done with multiprocessing? Are there any limitations?

Thanks!


回答1:


In case you want to just read in the data at the child processes and each child will add single key value pair you can use Pool:

import multiprocessing

def worker(x):
    return x, x ** 2

if __name__ == '__main__':
    multiprocessing.freeze_support()

    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    d = dict(pool.map(worker, xrange(10)))
    print d

Output:

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}



回答2:


Yes, Python supports multiprocessing.

Since you intend to work with the same dict for each "process" I would suggest multi-threading rather than multiprocessing, however. This allows each thread to use the same dict, rather than having to mess with sending the data from different processes into the parent's dict.

Obviously, you'll have issues if your method of input is user dependent or coming from stdin. But if you are getting input from a file, it should work fine.

I suggest this blog to assist you in using a thread pool. It also explains (somewhat) the use of the multiprocessing.dummy, which the docs do not.




回答3:


In the case you use multiprocessing, the entries need to be propagated to "parent processes dictionary", but there is a solution for this:

Using multiprocessing is helpful due to that guy called GIL ... so yes I did use that without thinking, as it is putting the cores to a good use. But I use a manager. like:

a_manager = multiprocessing.Manager

Then I use as shared structure:

shared_map = a_manager.dict()

and in the calls to start the process workers:

worker_seq = []
for n in range(multiprocessing.cpu_count()):
    worker_seq.append(multiprocessing.Process(target=my_work_function, args=(shared_map,))

There is quite some previous art to it like:

  • Python multiprocessing: How do I share a dict among multiple processes?

  • share dict between processes

  • python multiprocess update dictionary synchronously

  • Python sharing a dictionary between parallel processes



来源:https://stackoverflow.com/questions/38378310/reading-data-in-parallel-with-multiprocess

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!