multiprocess or threading in python?

后端未结

关注

 8  1699

I have a python application that grabs a collection of data and for each piece of data in that collection it performs a task. The task takes some time to complete as there i

相关标签:

8条回答

走了就别回头了

2020-12-01 02:32

If you are truly compute bound, using the multiprocessing module is probably the lightest weight solution (in terms of both memory consumption and implementation difficulty.)

If you are I/O bound, using the threading module will usually give you good results. Make sure that you use thread safe storage (like the Queue) to hand data to your threads. Or else hand them a single piece of data that is unique to them when they are spawned.

PyPy is focused on performance. It has a number of features that can help with compute-bound processing. They also have support for Software Transactional Memory, although that is not yet production quality. The promise is that you can use simpler parallel or concurrent mechanisms than multiprocessing (which has some awkward requirements.)

Stackless Python is also a nice idea. Stackless has portability issues as indicated above. Unladen Swallow was promising, but is now defunct. Pyston is another (unfinished) Python implementation focusing on speed. It is taking an approach different to PyPy, which may yield better (or just different) speedups.

0 讨论(0)
发布评论:

提交评论
- 加载中...
青春惊慌失措

2020-12-01 02:35

You may want to look at Twisted. It is designed for asynchronous network tasks.

0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2020-12-01 02:37

You might consider looking into Stackless Python. If you have control over the function that takes a long time, you can just throw some stackless.schedule()s in there (saying yield to the next coroutine), or else you can set Stackless to preemptive multitasking.

In Stackless, you don't have threads, but tasklets or greenlets which are essentially very lightweight threads. It works great in the sense that there's a pretty good framework with very little setup to get multitasking going.

However, Stackless hinders portability because you have to replace a few of the standard Python libraries -- Stackless removes reliance on the C stack. It's very portable if the next user also has Stackless installed, but that will rarely be the case.

0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2020-12-01 02:39

For small collections of data, simply create subprocesses with subprocess.Popen.

Each subprocess can simply get it's piece of data from stdin or from command-line arguments, do it's processing, and simply write the result to an output file.

When the subprocesses have all finished (or timed out), you simply merge the output files.

Very simple.

0 讨论(0)
发布评论:

提交评论
- 加载中...
遇见更好的自我

2020-12-01 02:43
Tasks runs like sequentially but you have the illusion that are run in parallel. Tasks are good when you use for file or connection I/O and because are lightweights.

Multiprocess with Pool may be the right solution for you because processes runs in parallel so are very good with intensive computing because each process run in one CPU (or core).

Setup multiprocess may be very easy:
```
from multiprocessing import Pool

def worker(input_item):
    output = do_some_work()
    return output

pool = Pool() # it make one process for each CPU (or core) of your PC. Use "Pool(4)" to force to use 4 processes, for example.
list_of_results = pool.map(worker, input_list) # Launch all automatically
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-12-01 02:44

If you can easily partition and separate the data you have, it sounds like you should just do that partitioning externally, and feed them to several processes of your program. (i.e. several processes instead of threads)

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页