Queue vs JoinableQueue in Python

前端 未结 2 713
没有蜡笔的小新
没有蜡笔的小新 2021-02-07 01:30

In Python while using multiprocessing module there are 2 kinds of queues:

  • Queue
  • JoinableQueue.

What is the difference between them?

Q

2条回答
  •  遥遥无期
    2021-02-07 02:11

    Based on the documentation, it's hard to be sure that Queue is actually empty. With JoinableQueue you can wait for the queue to empty by calling q.join(). In cases where you want to complete work in distinct batches where you do something discrete at the end of each batch, this could be helpful.

    For example, perhaps you process 1000 items at a time through the queue, then send a push notification to a user that you've completed another batch. This would be challenging to implement with a normal Queue.

    It might look something like:

    import multiprocessing as mp
    
    BATCH_SIZE = 1000
    STOP_VALUE = 'STOP'
    
    def consume(q):
      for item in iter(q.get, STOP_VALUE):
        try:
          process(item)
        # Be very defensive about errors since they can corrupt pipes.
        except Exception as e:
          logger.error(e)
        finally:
          q.task_done()
    
    q = mp.JoinableQueue()
    with mp.Pool() as pool:
      # Pull items off queue as fast as we can whenever they're ready.
      for _ in range(mp.cpu_count()):
        pool.apply_async(consume, q)
      for i in range(0, len(URLS), BATCH_SIZE):
        # Put `BATCH_SIZE` items in queue asynchronously.
        pool.map_async(expensive_func, URLS[i:i+BATCH_SIZE], callback=q.put)
        # Wait for the queue to empty.
        q.join()
        notify_users()
      # Stop the consumers so we can exit cleanly.
      for _ in range(mp.cpu_count()):
        q.put(STOP_VALUE)
    

    NB: I haven't actually run this code. If you pull items off the queue faster than you put them on, you might finish early. In that case this code sends an update AT LEAST every 1000 items, and maybe more often. For progress updates, that's probably ok. If it's important to be exactly 1000, you could use an mp.Value('i', 0) and check that it's 1000 whenever your join releases.

提交回复
热议问题