问题
What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise them. Coming from a C background, I am familiar with three approaches to parallelism:
- Message passing processes, possibly distributed across a cluster, e.g. MPI.
- Explicit shared memory parallelism, either using pthreads or fork(), pipe(), et. al
- Implicit shared memory parallelism, using OpenMP.
Deciding on an approach to use is an exercise in trade-offs.
In Python, what approaches are available and what are their characteristics? Is there a clusterable MPI clone? What are the preferred ways of achieving shared memory parallelism? I have heard reference to problems with the GIL, as well as references to tasklets.
In short, what do I need to know about the different parallelization strategies in Python before choosing between them?
回答1:
Generally, you describe a CPU bound calculation. This is not Python's forte. Neither, historically, is multiprocessing.
Threading in the mainstream Python interpreter has been ruled by a dreaded global lock. The new multiprocessing API works around that and gives a worker pool abstraction with pipes and queues and such.
You can write your performance critical code in C or Cython, and use Python for the glue.
回答2:
The new (2.6) multiprocessing module is the way to go. It uses subprocesses, which gets around the GIL problem. It also abstracts away some of the local/remote issues, so the choice of running your code locally or spread out over a cluster can be made later. The documentation I've linked above is a fair bit to chew on, but should provide a good basis to get started.
回答3:
Ray is an elegant (and fast) library for doing this.
The most basic strategy for parallelizing Python functions is to declare a function with the @ray.remote
decorator. Then it can be invoked asynchronously.
import ray
import time
# Start the Ray processes (e.g., a scheduler and shared-memory object store).
ray.init(num_cpus=8)
@ray.remote
def f():
time.sleep(1)
# This should take one second assuming you have at least 4 cores.
ray.get([f.remote() for _ in range(4)])
You can also parallelize stateful computation using actors, again by using the @ray.remote
decorator.
# This assumes you already ran 'import ray' and 'ray.init()'.
import time
@ray.remote
class Counter(object):
def __init__(self):
self.x = 0
def inc(self):
self.x += 1
def get_counter(self):
return self.x
# Create two actors which will operate in parallel.
counter1 = Counter.remote()
counter2 = Counter.remote()
@ray.remote
def update_counters(counter1, counter2):
for _ in range(1000):
time.sleep(0.25)
counter1.inc.remote()
counter2.inc.remote()
# Start three tasks that update the counters in the background also in parallel.
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)
# Check the counter values.
for _ in range(5):
counter1_val = ray.get(counter1.get_counter.remote())
counter2_val = ray.get(counter2.get_counter.remote())
print("Counter1: {}, Counter2: {}".format(counter1_val, counter2_val))
time.sleep(1)
It has a number of advantages over the multiprocessing module:
- The same code runs on a single multi-core machine as well as a large cluster.
- Data is shared efficiently between processes on the same machine using shared memory and efficient serialization.
- You can parallelize Python functions (using tasks) and Python classes (using actors).
- Error messages are propagated nicely.
Ray is a framework I've been helping develop.
回答4:
There are many packages to do that, the most appropriate as other said is multiprocessing, expecially with the class "Pool".
A similar result can be obtained by parallel python, that in addition is designed to work with clusters.
Anyway, I would say go with multiprocessing.
回答5:
Depending on how much data you need to process and how many CPUs/machines you intend to use, it is in some cases better to write a part of it in C (or Java/C# if you want to use jython/IronPython)
The speedup you can get from that might do more for your performance than running things in parallel on 8 CPUs.
来源:https://stackoverflow.com/questions/2987980/parallelism-in-python