1: What is "non-blocking" concurrency and how is it different than normal concurrency using threads? Why don't we use non-blocking concurrency in all the scenarios where concurrency is required? Is there overhead for using non-blocking concurrency?
Non blocking algorithms do not use specific object locking schemes to control concurrent access to memory (synchronized and standard object locks are examples that use object/function level locks to reduce concurrent access problems in Java. Instead these use some form of low level instruction to perform (on some level) a simulataneous compare and swap on a memory location; if this fails it just returns false and does not error out, if it works then it was successful and you move on. Generally, this is attempted in a loop until it works, since there will only be small periods of time (hopefully) when this would fail, it just loops a few extra times until it can set the memory it needs to.
This is not always used because it is much more complex from a code perspective even for relatively trivial use cases than the standard Java synchronization. Moreover, for most uses the performance impact of the locking is trivial compared to other sources in the system. In most cases, the performance requirements are not nearly high enough to warrant even looking at this.
Finally, as the JDK/JRE evolves, the core designers are improving the internal language implementations to attempt to incorporate the most efficient means of achieving these ends in the core constructs. As you move away from the core constructs, you lose the automatic implementation of those improvements since you are using less standard implementations (for instance jaxb/jibx; jaxb used to grossly underperform jibx, but now is equal if not faster in most cases that I've tested as of java 7) when you bump your java version.
if you look at the code example below, you can see the 'overhead' locations. It's not really overhead per se, but the code must be extremely efficient in order to work non-locking and actually perform better than a standard synchronized version due to the looping. Even slight modifications can lead to code that will go from performing several times better than standard to code that is several times worse (for instance object instantiations that don't need to be there or even quick conditional checks; you're talking about saving cycles here, so the difference between success and failure is very slim).
2: I have heard that non-blocking concurrency is available in Java. Are there any particular scenarios where we should use this feature?
In my opinion you should only use this if you A) have a proven performance problem in your running system in production, on its production hardware; and B) if you can prove that the only inefficiency left in the critical section is locking related; C) you have firm buy in from your stakeholders that they are willing to have non-standard less maintainable code in return for the performance improvement which you must D) prove numerically on your production hardware to be certain it will even help at all.
3: Is there a difference or advantage to using one of these methods with a collection? What are the trade-offs?
The advantage is performance, the trade off is first that it's more specialized code (so many developers don't know what to make of it;making it harder for a new team or new hire to come up to speed, remember that the majority of the cost of software is labor, so you have to watch the total cost of ownership that you impose through design decisions), and that any modifications should be tested again to ensure that the construct is still actually faster. Generally in a system that would require this some performance or load and throughput testing would be required for any changes. If you aren't doing these tests then I would argue that you almost certainly don't need to even think about these approaches, and would almost definitely not see any value for the increased complexity (if you got it all to work right).
Again, I just have to restate all the standard warnings against optimization generally, as many of these arguments are the same that I would use against this as a design. Many of the drawbacks to this are the same as any optimization, for instance, whenever you change the code you have to ensure that your 'fix' doesn't introduce inefficiency in some construct that was only placed there to improve performance, and deal with that (meaning up to refactoring the entire section to potentially remove the optimizations) if the fix is critical and it reduces performance.
It is really, really easy to mess this up in ways that are very difficult to debug, so if you don't have to do it (which I have only found a few scenarios where you ever would; and to me those were pretty questionable and I would have preferred to not do it) do not do it. use the standard stuff and everyone will be happier!
Discussion/Code
Non blocking or lock free concurrency avoids the use of specific object locks to control shared memory access (like synchronized blocks or specific locks). There is a performance advantage when the code section is non-locking; however, the code in the CAS loop(if this is the way you go, there are other methods in Java) must be very, very efficient or this will end up costing you more performance than you gain.
Like all performance optimizations, the extra complexity is not worth the effect for most use cases. Cleanly written Java using standard constructs will work as well if not better than most optimizations (and actually allow your organization to maintain the software more easily once you're gone). To my mind this only makes sense in very high performance sections with proven performance issues where the locking is the only source of inefficiency. If you do not definitely have a known and quantified performance problem, I would avoid the use of any technique like this until you have proven the problem is actually there because of the locking and not do to other issues with the efficiency of the code. Once you have a proven locking based performance problem I would ensure that you have some type of metric in place to ensure that this type of setup is actually going to run faster for you than just using standard Java concurrency.
The implementation that I have done for this use CAS operations and the Atomic family of variables. This basic code has never locked up or raised any errors for me in this use case (random sampling input and output for offline testing from a high throughput translation system). It basically works like this:
You have some object that is shared between threads, and this is declared as either an AtomicXXX or an AtomicReference (for most non-trivial use cases you'll run with the AtomicReference version).
when the given value/object is referenced, you retrieve it from the Atomic wrapper, this gets you a local copy on which you perform some modification. From here you use a compareAndSwap as the condition of a while loop to attempt to set this Atomic from your thread, if this fails it returns false as opposed to locking up. This will iterate until it works (the code in this loop must be very efficient and simple).
You can look up CAS operations to see how they work, it's basically intended to be implemented as a single instruction set with a comparison at the end to see if the value is what you tried to set it to.
If the compareAndSwap fails, you get your object again from the Atomic wrapper, perform any modifications again, and then try the compare and swap again until it works. There is no specific lock, you're just trying to set the object back into memory and if it fails you just try again whenever your thread gets control again.
Code for this is below for a simple case with a List:
/* field declaration*/
//Note that I have an initialization block which ensures that the object in this
//reference is never null, this was required to remove null checks and ensure the CAS
//loop was efficient enough to improve performance in my use case
private AtomicReference> specialSamplingRulesAtomic = new AtomicReference>();
/*start of interesting code section*/
List list = specialSamplingRulesAtomic.get();
list.add(message);
while(!specialSamplingRulesAtomic.compareAndSet(specialSamplingRulesAtomic.get(), list)){
list = specialSamplingRulesAtomic.get();
list.add(message);
};
/* end of interesting code section*/