Python multiprocessing BETWEEN Amazon cloud instances

前端 未结 3 1998
野性不改
野性不改 2021-01-13 18:38

I\'m looking to run a long-running python analysis process on a few Amazon EC2 instances. The code already runs using the python multiprocessing module and can

相关标签:
3条回答
  • 2021-01-13 19:06

    the docs give you a good setup for running multiprocessing on multiple machines. Using s3 is a good way to share files across ec2 instances, but with multiprocessing you can share queues and pass data.

    if you can use hadoop for parallel tasks, it is a very good way to extract parallelism across machines, but if you need a lot of IPC then building your own solution with multiprocessing isn't that bad.

    just make sure you put your machines in the same security groups :-)

    0 讨论(0)
  • 2021-01-13 19:17

    I would use dumbo. It is a python wrapper for Hadoop that is compatible with Amazon Elastic MapReduce. Write a little wrapper around your code to integrate with dumbo. Note that you probably need a map-only job with no reduce step.

    0 讨论(0)
  • 2021-01-13 19:19

    I've been digging into IPython recently, and it looks like it supports parallel processing accross multiple hosts right out of the box:

    http://ipython.org/ipython-doc/stable/html/parallel/index.html

    0 讨论(0)
提交回复
热议问题