How do I parallelize a simple Python loop?

后端 未结 13 1265
北荒
北荒 2020-11-22 11:54

This is probably a trivial question, but how do I parallelize the following loop in python?

# setup output lists
output1 = list()
output2 = list()
output3 =          


        
相关标签:
13条回答
  • 2020-11-22 12:26

    I found joblib is very useful with me. Please see following example:

    from joblib import Parallel, delayed
    def yourfunction(k):   
        s=3.14*k*k
        print "Area of a circle with a radius ", k, " is:", s
    
    element_run = Parallel(n_jobs=-1)(delayed(yourfunction)(k) for k in range(1,10))
    

    n_jobs=-1: use all available cores

    0 讨论(0)
  • 2020-11-22 12:28

    Have a look at this;

    http://docs.python.org/library/queue.html

    This might not be the right way to do it, but I'd do something like;

    Actual code;

    from multiprocessing import Process, JoinableQueue as Queue 
    
    class CustomWorker(Process):
        def __init__(self,workQueue, out1,out2,out3):
            Process.__init__(self)
            self.input=workQueue
            self.out1=out1
            self.out2=out2
            self.out3=out3
        def run(self):
                while True:
                    try:
                        value = self.input.get()
                        #value modifier
                        temp1,temp2,temp3 = self.calc_stuff(value)
                        self.out1.put(temp1)
                        self.out2.put(temp2)
                        self.out3.put(temp3)
                        self.input.task_done()
                    except Queue.Empty:
                        return
                       #Catch things better here
        def calc_stuff(self,param):
            out1 = param * 2
            out2 = param * 4
            out3 = param * 8
            return out1,out2,out3
    def Main():
        inputQueue = Queue()
        for i in range(10):
            inputQueue.put(i)
        out1 = Queue()
        out2 = Queue()
        out3 = Queue()
        processes = []
        for x in range(2):
              p = CustomWorker(inputQueue,out1,out2,out3)
              p.daemon = True
              p.start()
              processes.append(p)
        inputQueue.join()
        while(not out1.empty()):
            print out1.get()
            print out2.get()
            print out3.get()
    if __name__ == '__main__':
        Main()
    

    Hope that helps.

    0 讨论(0)
  • 2020-11-22 12:28

    thanks @iuryxavier

    from multiprocessing import Pool
    from multiprocessing import cpu_count
    
    
    def add_1(x):
        return x + 1
    
    if __name__ == "__main__":
        pool = Pool(cpu_count())
        results = pool.map(add_1, range(10**12))
        pool.close()  # 'TERM'
        pool.join()   # 'KILL'
    
    0 讨论(0)
  • 2020-11-22 12:29

    Let's say we have an async function

    async def work_async(self, student_name: str, code: str, loop):
    """
    Some async function
    """
        # Do some async procesing    
    

    That needs to be run on a large array. Some attributes are being passed to the program and some are used from property of dictionary element in the array.

    async def process_students(self, student_name: str, loop):
        market = sys.argv[2]
        subjects = [...] #Some large array
        batchsize = 5
        for i in range(0, len(subjects), batchsize):
            batch = subjects[i:i+batchsize]
            await asyncio.gather(*(self.work_async(student_name,
                                               sub['Code'],
                                               loop)
                           for sub in batch))
    
    0 讨论(0)
  • 2020-11-22 12:30

    very simple example of parallel processing is

    from multiprocessing import Process
    
    output1 = list()
    output2 = list()
    output3 = list()
    
    def yourfunction():
        for j in range(0, 10):
            # calc individual parameter value
            parameter = j * offset
            # call the calculation
            out1, out2, out3 = calc_stuff(parameter=parameter)
    
            # put results into correct output list
            output1.append(out1)
            output2.append(out2)
            output3.append(out3)
    
    if __name__ == '__main__':
        p = Process(target=pa.yourfunction, args=('bob',))
        p.start()
        p.join()
    
    0 讨论(0)
  • 2020-11-22 12:33

    There are a number of advantages to using Ray:

    • You can parallelize over multiple machines in addition to multiple cores (with the same code).
    • Efficient handling of numerical data through shared memory (and zero-copy serialization).
    • High task throughput with distributed scheduling.
    • Fault tolerance.

    In your case, you could start Ray and define a remote function

    import ray
    
    ray.init()
    
    @ray.remote(num_return_vals=3)
    def calc_stuff(parameter=None):
        # Do something.
        return 1, 2, 3
    

    and then invoke it in parallel

    output1, output2, output3 = [], [], []
    
    # Launch the tasks.
    for j in range(10):
        id1, id2, id3 = calc_stuff.remote(parameter=j)
        output1.append(id1)
        output2.append(id2)
        output3.append(id3)
    
    # Block until the results have finished and get the results.
    output1 = ray.get(output1)
    output2 = ray.get(output2)
    output3 = ray.get(output3)
    

    To run the same example on a cluster, the only line that would change would be the call to ray.init(). The relevant documentation can be found here.

    Note that I'm helping to develop Ray.

    0 讨论(0)
提交回复
热议问题