I want to iterate over a list using 2 thread. One from leading and other from trailing, and put the elements in a Queue
on each iteration. But before putting th
A couple of things that I think can be improved:
threading
) module. In general, CPython threading will not cause CPU intensive work to speed up. (Depending on what exactly is the context of your question, it's also possible that multiprocessing
won't, but threading
almost certainly won't.)is_inqueue
would likely lead to high contention.The locked time seems linear in the number of items that need to be traversed:
def is_in_queue(x, q):
with q.mutex:
return x in q.queue
So, instead, you could possibly do the following.
Use multiprocessing
with a shared dict
:
from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
# Fn definitions and such
p1 = Process(target=p1, args=(d,))
p2 = Process(target=p2, args=(d,))
within each function, check for the item like this:
def p1(d):
# Stuff
if 'foo' in d:
return
Before we get into bigger problems, you're not using Queue.join right.
The whole point of this function is that a producer who adds a bunch of items to a queue can wait until the consumer or consumers have finished working on all of those items. This works by having the consumer call task_done
after they finish working on each item that they pulled off with get
. Once there have been as many task_done
calls as put
calls, the queue is done. You're not doing a get
anywhere, much less a task_done
, so there's no way the queue can ever be finished. So, that's why you block forever after the two threads finish.
The first problem here is that your threads are doing almost no work outside of the actual synchronization. If the only thing they do is fight over a queue, only one of them is going to be able to run at a time.
Of course that's common in toy problems, but you have to think through your real problem:
yield from
to use coroutines to build things like schedulers or many-actor simulations.Next, as I alluded to in your previous question, making threads (or processes) work efficiently with shared state requires holding locks for as short a time as possible.
So, if you have to search a whole queue under a lock, that had better be a constant-time search, not a linear-time search. That's why I suggested using something like an OrderedSet
recipe rather than a list
, like the one inside the stdlib's Queue.Queue
. Then this function:
def is_in_queue(x, q):
with q.mutex:
return x in q.queue
… is only blocking the queue for a tiny fraction of a second—just long enough to look up a hash value in a table, instead of long enough to compare every element in the queue against x
.
Finally, I tried to explain about race conditions on your other question, but let me try again.
You need a lock around every complete "transaction" in your code, not just around the individual operations.
For example, if you do this:
with queue locked:
see if x is in the queue
if x was not in the queue:
with queue locked:
add x to the queue
… then it's always possible that x was not in the queue when you checked, but in the time between when you unlocked it and relocked it, someone added it. This is exactly why it's possible for both threads to stop early.
To fix this, you need to put a lock around the whole thing:
with queue locked:
if x is not in the queue:
add x to the queue
Of course this goes directly against what I said before about locking the queue for as short a time as possible. Really, that's what makes multithreading hard in a nutshell. It's easy to write safe code that just locks everything for as long as might conceivably be necessary, but then your code ends up only using a single core, while all the other threads are blocked waiting for the lock. And it's easy to write fast code that just locks everything as briefly as possible, but then it's unsafe and you get garbage values or even crashes all over the place. Figuring out what needs to be a transaction, and how to minimize the work inside those transactions, and how to deal with the multiple locks you'll probably need to make that work without deadlocking them… that's not so easy.