I need a queue which multiple threads can put stuff into, and multiple threads may read from.
Python has at least two queue classes, Queue.Queue and collections.dequ
deque
is thread-safe. "operations that do not require locking" means that you don't have to do the locking yourself, the deque
takes care of it.
Taking a look at the Queue
source, the internal deque is called self.queue
and uses a mutex for accessors and mutations, so Queue().queue
is not thread-safe to use.
If you're looking for an "in" operator, then a deque or queue is possibly not the most appropriate data structure for your problem.
If all you're looking for is a thread-safe way to transfer objects between threads, then both would work (both for FIFO and LIFO). For FIFO:
Note:
deque
might not be thread safe, I'm not sure.deque
does not block on pop()
or popleft()
so you can't base your consumer thread flow on blocking till a new item arrives.However, it seems that deque has a significant efficiency advantage. Here are some benchmark results in seconds using CPython 2.7.3 for inserting and removing 100k items
deque 0.0747888759791
Queue 1.60079066852
Here's the benchmark code:
import time
import Queue
import collections
q = collections.deque()
t0 = time.clock()
for i in xrange(100000):
q.append(1)
for i in xrange(100000):
q.popleft()
print 'deque', time.clock() - t0
q = Queue.Queue(200000)
t0 = time.clock()
for i in xrange(100000):
q.put(1)
for i in xrange(100000):
q.get()
print 'Queue', time.clock() - t0
Adding notify_all()
to each deque
append
and popleft
results in far worse results for deque
than the 20x improvement achieved by default deque behavior:
deque + notify_all: 0.469802
Queue: 0.667279
@Jonathan modify his code a little and I get the benchmark using cPython 3.6.2 and add condition in deque loop to simulate the behaviour Queue do.
import time
from queue import Queue
import threading
import collections
mutex = threading.Lock()
condition = threading.Condition(mutex)
q = collections.deque()
t0 = time.clock()
for i in range(100000):
with condition:
q.append(1)
condition.notify_all()
for _ in range(100000):
with condition:
q.popleft()
condition.notify_all()
print('deque', time.clock() - t0)
q = Queue(200000)
t0 = time.clock()
for _ in range(100000):
q.put(1)
for _ in range(100000):
q.get()
print('Queue', time.clock() - t0)
And it seems the performance limited by
this function condition.notify_all()
collections.deque is an alternative implementation of unbounded queues with fast atomic append() and popleft() operations that do not require locking. docs Queue
All single-element methods on deque
are atomic and thread-safe. All other methods are thread-safe too. Things like len(dq)
, dq[4]
yield momentary correct values. But think e.g. about dq.extend(mylist)
: you don't get a guarantee that all elements in mylist
are filed in a row when other threads also append elements on the same side - but thats usually not a requirement in inter-thread communication and for the questioned task.
So a deque
is ~20x faster than Queue
(which uses a deque
under the hood) and unless you don't need the "comfortable" synchronization API (blocking / timeout), the strict maxsize
obeyance or the "Override these methods (_put, _get, ..) to implement other queue organizations" sub-classing behavior, or when you take care of such things yourself, then a bare deque
is a good and efficient deal for high-speed inter-thread communication.
In fact the heavy usage of an extra mutex and extra method ._get()
etc. method calls in Queue.py
is due to backwards compatibility constraints, past over-design and lack of care for providing an efficient solution for this important speed bottleneck issue in inter-thread communication. A list was used in older Python versions - but even list.append()/.pop(0) was & is atomic and threadsafe ...
Queue.Queue
and collections.deque
serve different purposes. Queue.Queue is intended for allowing different threads to communicate using queued messages/data, whereas collections.deque
is simply intended as a datastructure. That's why Queue.Queue
has methods like put_nowait()
, get_nowait()
, and join()
, whereas collections.deque
doesn't. Queue.Queue
isn't intended to be used as a collection, which is why it lacks the likes of the in
operator.
It boils down to this: if you have multiple threads and you want them to be able to communicate without the need for locks, you're looking for Queue.Queue
; if you just want a queue or a double-ended queue as a datastructure, use collections.deque
.
Finally, accessing and manipulating the internal deque of a Queue.Queue
is playing with fire - you really don't want to be doing that.
For information there is a Python ticket referenced for deque thread-safety (https://bugs.python.org/issue15329). Title "clarify which deque methods are thread-safe"
Bottom line here: https://bugs.python.org/issue15329#msg199368
The deque's append(), appendleft(), pop(), popleft(), and len(d) operations are thread-safe in CPython. The append methods have a DECREF at the end (for cases where maxlen has been set), but this happens after all of the structure updates have been made and the invariants have been restored, so it is okay to treat these operations as atomic.
Anyway, if you are not 100% sure and you prefer reliability over performance, just put a like Lock ;)