I am using BlockingQueue:s (trying both ArrayBlockingQueue and LinkedBlockingQueue) to pass objects between different threads in an application I’m currently working on. Perform
Your test is not a good measure of queue handoff latency because you have a single thread reading off the queue which writes synchronously to System.out
(doing a String and long concatenation while it is at it) before it takes again. To measure this properly you need to move this activity out of this thread and do as little work as possible in the taking thread.
You'd be better off just doing the calculation (then-now) in the taker and adding the result to some other collection which is periodically drained by another thread that outputs the results. I tend to do this by adding to an appropriately presized array backed structure accessed via an AtomicReference (hence the reporting thread just has to getAndSet on that reference with another instance of that storage structure in order to grab the latest batch of results; e.g. make 2 lists, set one as active, every x s a thread wakes up and swaps the active and the passive ones). You can then report some distribution instead of every single result (e.g. a decile range) which means you don't generate vast log files with every run and get useful information printed for you.
FWIW I concur with the times Peter Lawrey stated & if latency is really critical then you need to think about busy waiting with appropriate cpu affinity (i.e. dedicate a core to that thread)
EDIT after Jan 6
If I remove the call to Thread.sleep () and instead let both the producer and consumer call barrier.await() in every iteration (the consumer calls it after having printed the elapsed time to the console), the measured latency is reduced from 60 microseconds to below 10 microseconds. If running the threads on the same core, the latency gets below 1 microsecond. Can anyone explain why this reduced the latency so significantly?
You're looking at the difference between java.util.concurrent.locks.LockSupport#park
(and corresponding unpark
) and Thread#sleep
. Most j.u.c. stuff is built on LockSupport
(often via an AbstractQueuedSynchronizer
that ReentrantLock
provides or directly) and this (in Hotspot) resolves down to sun.misc.Unsafe#park
(and unpark
) and this tends to end up in the hands of the pthread (posix threads) lib. Typically pthread_cond_broadcast
to wake up and pthread_cond_wait
or pthread_cond_timedwait
for things like BlockingQueue#take
.
I can't say I've ever looked at how Thread#sleep
is actually implemented (cos I've never come across something low latency that isn't a condition based wait) but I would imagine that it causes it to be demoted by the schedular in a more aggressive way than the pthread signalling mechanism and that is what accounts for the latency difference.