I\'m writing a program where a large number of agents listen for events and react on them. Since Control.Concurrent.Chan.dupChan
is deprecated I decided to use TCha
This is a great test case! I think you've actually created a rare instance of genuine livelock/starvation. We can test this by compiling with -eventlog
and running with -vst
or by compiling with -debug
and running with -Ds
. We see that even as the program "hangs" the runtime still is working like crazy, jumping between blocked threads.
The high-level reason is that you have one (fast) writer and many (fast) readers. The readers and writer both need to access the same tvar representing the end of the queue. Let's say that nondeterministically one thread succeeds and all others fail when this happens. Now, as we increase the number of threads in contention to 100*100, then the probability of the reader making progress rapidly goes towards zero. In the meantime, the writer in fact takes longer in its access to that tvar than do the readers, so that makes things worse for it.
In this instance, putting a tiny throttle between each invocation of go
for the writer (say, threadDelay 100
) is enough to fix the problem. It gives the readers enough time to all block between successive writes, and so eliminates the livelock. However, I do think that it would be an interesting problem to improve the behavior of the runtime scheduler to deal with situations like this.
Adding to what Neil said, your code also has a space leak (noticeable with smaller n
): After fixing the obvious tuple build-up issue by making tuples strict, I was left with the following profile: What's happening here, I think, is that the main thread is writing data to the shared TChan
faster than the worker threads can read it (TChan
, like Chan
, is unbounded). So the worker threads spend most of their time reexecuting their respective STM transactions, while the main thread is busy stuffing even more data into the channel; this explains why your program hangs.
The program is going to perform quite badly. You're spawning off 10,000 threads all of which will queue up waiting for a single TVar to be written to. So once they're all going, you may well get this happening:
So each item will cause processing O(10,000). If you see 100 events per second, that means that each thread requires about 1 microsecond to wake up, read a couple of TVars, write to one and queue up again. That doesn't seem so unreasonable. I don't understand why the program would grind to a complete halt, though.
In general, I would scrap this design and replace it as follows:
Have a single thread reading the event channel, which maintains a map from coordinate to interested-receiver-channel. The single thread can then pick out the receiver(s) from the map in O(log N) time (much better than O(N), and with a much smaller constant factor involved), and send the event to just the interested receiver. So you perform just one or two communications to the interested party, rather than 10,000 communications to everyone. A list-based form of the idea is written in CHP in section 5.4 of this paper: http://chplib.files.wordpress.com/2011/05/chp.pdf