How to run functions in parallel?

后端 未结 6 1031
我在风中等你
我在风中等你 2020-11-22 10:32

I researched first and couldn\'t find an answer to my question. I am trying to run multiple functions in parallel in Python.

I have something like this:



        
相关标签:
6条回答
  • 2020-11-22 10:53

    If your functions are mainly doing I/O work (and less CPU work) and you have Python 3.2+, you can use a ThreadPoolExecutor:

    from concurrent.futures import ThreadPoolExecutor
    
    def run_io_tasks_in_parallel(tasks):
        with ThreadPoolExecutor() as executor:
            running_tasks = [executor.submit(task) for task in tasks]
            for running_task in running_tasks:
                running_task.result()
    
    run_io_tasks_in_parallel([
        lambda: print('IO task 1 running!'),
        lambda: print('IO task 2 running!'),
    ])
    

    If your functions are mainly doing CPU work (and less I/O work) and you have Python 2.6+, you can use the multiprocessing module:

    from multiprocessing import Process
    
    def run_cpu_tasks_in_parallel(tasks):
        running_tasks = [Process(target=task) for task in tasks]
        for running_task in running_tasks:
            running_task.start()
        for running_task in running_tasks:
            running_task.join()
    
    run_cpu_tasks_in_parallel([
        lambda: print('CPU task 1 running!'),
        lambda: print('CPU task 2 running!'),
    ])
    
    0 讨论(0)
  • 2020-11-22 10:56

    This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.

    To parallelize your example, you'd need to define your functions with the @ray.remote decorator, and then invoke them with .remote.

    import ray
    
    ray.init()
    
    dir1 = 'C:\\folder1'
    dir2 = 'C:\\folder2'
    filename = 'test.txt'
    addFiles = [25, 5, 15, 35, 45, 25, 5, 15, 35, 45]
    
    # Define the functions. 
    # You need to pass every global variable used by the function as an argument.
    # This is needed because each remote function runs in a different process,
    # and thus it does not have access to the global variables defined in 
    # the current process.
    @ray.remote
    def func1(filename, addFiles, dir):
        # func1() code here...
    
    @ray.remote
    def func2(filename, addFiles, dir):
        # func2() code here...
    
    # Start two tasks in the background and wait for them to finish.
    ray.get([func1.remote(filename, addFiles, dir1), func2.remote(filename, addFiles, dir2)]) 
    

    If you pass the same argument to both functions and the argument is large, a more efficient way to do this is using ray.put(). This avoids the large argument to be serialized twice and to create two memory copies of it:

    largeData_id = ray.put(largeData)
    
    ray.get([func1(largeData_id), func2(largeData_id)])
    

    Important - If func1() and func2() return results, you need to rewrite the code as follows:

    ret_id1 = func1.remote(filename, addFiles, dir1)
    ret_id2 = func2.remote(filename, addFiles, dir2)
    ret1, ret2 = ray.get([ret_id1, ret_id2])
    

    There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.

    0 讨论(0)
  • 2020-11-22 10:56

    Seems like you have a single function that you need to call on two different parameters. This can be elegantly done using a combination of concurrent.futures and map with Python 3.2+

    import time
    from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
    
    def sleep_secs(seconds):
      time.sleep(seconds)
      print(f'{seconds} has been processed')
    
    secs_list = [2,4, 6, 8, 10, 12]
    

    Now, if your operation is IO bound, then you can use the ThreadPoolExecutor as such:

    with ThreadPoolExecutor() as executor:
      results = executor.map(sleep_secs, secs_list)
    

    Note how map is used here to map your function to the list of arguments.

    Now, If your function is CPU bound, then you can use ProcessPoolExecutor

    with ProcessPoolExecutor() as executor:
      results = executor.map(sleep_secs, secs_list)
    

    If you are not sure, you can simply try both and see which one gives you better results.

    Finally, if you are looking to print out your results, you can simply do this:

    with ThreadPoolExecutor() as executor:
      results = executor.map(sleep_secs, secs_list)
      for result in results:
        print(result)
    
    0 讨论(0)
  • 2020-11-22 11:08

    There's no way to guarantee that two functions will execute in sync with each other which seems to be what you want to do.

    The best you can do is to split up the function into several steps, then wait for both to finish at critical synchronization points using Process.join like @aix's answer mentions.

    This is better than time.sleep(10) because you can't guarantee exact timings. With explicitly waiting, you're saying that the functions must be done executing that step before moving to the next, instead of assuming it will be done within 10ms which isn't guaranteed based on what else is going on on the machine.

    0 讨论(0)
  • 2020-11-22 11:10

    If you are a windows user and using python 3, then this post will help you to do parallel programming in python.when you run a usual multiprocessing library's pool programming, you will get an error regarding the main function in your program. This is because the fact that windows has no fork() functionality. The below post is giving a solution to the mentioned problem .

    http://python.6.x6.nabble.com/Multiprocessing-Pool-woes-td5047050.html

    Since I was using the python 3, I changed the program a little like this:

    from types import FunctionType
    import marshal
    
    def _applicable(*args, **kwargs):
      name = kwargs['__pw_name']
      code = marshal.loads(kwargs['__pw_code'])
      gbls = globals() #gbls = marshal.loads(kwargs['__pw_gbls'])
      defs = marshal.loads(kwargs['__pw_defs'])
      clsr = marshal.loads(kwargs['__pw_clsr'])
      fdct = marshal.loads(kwargs['__pw_fdct'])
      func = FunctionType(code, gbls, name, defs, clsr)
      func.fdct = fdct
      del kwargs['__pw_name']
      del kwargs['__pw_code']
      del kwargs['__pw_defs']
      del kwargs['__pw_clsr']
      del kwargs['__pw_fdct']
      return func(*args, **kwargs)
    
    def make_applicable(f, *args, **kwargs):
      if not isinstance(f, FunctionType): raise ValueError('argument must be a function')
      kwargs['__pw_name'] = f.__name__  # edited
      kwargs['__pw_code'] = marshal.dumps(f.__code__)   # edited
      kwargs['__pw_defs'] = marshal.dumps(f.__defaults__)  # edited
      kwargs['__pw_clsr'] = marshal.dumps(f.__closure__)  # edited
      kwargs['__pw_fdct'] = marshal.dumps(f.__dict__)   # edited
      return _applicable, args, kwargs
    
    def _mappable(x):
      x,name,code,defs,clsr,fdct = x
      code = marshal.loads(code)
      gbls = globals() #gbls = marshal.loads(gbls)
      defs = marshal.loads(defs)
      clsr = marshal.loads(clsr)
      fdct = marshal.loads(fdct)
      func = FunctionType(code, gbls, name, defs, clsr)
      func.fdct = fdct
      return func(x)
    
    def make_mappable(f, iterable):
      if not isinstance(f, FunctionType): raise ValueError('argument must be a function')
      name = f.__name__    # edited
      code = marshal.dumps(f.__code__)   # edited
      defs = marshal.dumps(f.__defaults__)  # edited
      clsr = marshal.dumps(f.__closure__)  # edited
      fdct = marshal.dumps(f.__dict__)  # edited
      return _mappable, ((i,name,code,defs,clsr,fdct) for i in iterable)
    

    After this function , the above problem code is also changed a little like this:

    from multiprocessing import Pool
    from poolable import make_applicable, make_mappable
    
    def cube(x):
      return x**3
    
    if __name__ == "__main__":
      pool    = Pool(processes=2)
      results = [pool.apply_async(*make_applicable(cube,x)) for x in range(1,7)]
      print([result.get(timeout=10) for result in results])
    

    And I got the output as :

    [1, 8, 27, 64, 125, 216]
    

    I am thinking that this post may be useful for some of the windows users.

    0 讨论(0)
  • 2020-11-22 11:13

    You could use threading or multiprocessing.

    Due to peculiarities of CPython, threading is unlikely to achieve true parallelism. For this reason, multiprocessing is generally a better bet.

    Here is a complete example:

    from multiprocessing import Process
    
    def func1():
      print 'func1: starting'
      for i in xrange(10000000): pass
      print 'func1: finishing'
    
    def func2():
      print 'func2: starting'
      for i in xrange(10000000): pass
      print 'func2: finishing'
    
    if __name__ == '__main__':
      p1 = Process(target=func1)
      p1.start()
      p2 = Process(target=func2)
      p2.start()
      p1.join()
      p2.join()
    

    The mechanics of starting/joining child processes can easily be encapsulated into a function along the lines of your runBothFunc:

    def runInParallel(*fns):
      proc = []
      for fn in fns:
        p = Process(target=fn)
        p.start()
        proc.append(p)
      for p in proc:
        p.join()
    
    runInParallel(func1, func2)
    
    0 讨论(0)
提交回复
热议问题