My question is related to multithreading lock-free synchronization. I wanted to know the following:
What are general approaches to achieve this? I read somewher
There are some useful way to use lock-free sychronization (such as those @Tudor mentions). But I want to warn about one thing - lock-free syncrhonization doesn't compose.
You may have, for example, an integer maintained by compare&swap, and it's OK. You may also have a queue, maintained by a lock-free algorithms (it's a bit tricky, but there are good algorithms for it), and the queue is also OK.
But if you try to use the counter to count the elements in a queue, you'll get wrong answers. There will be times when an element was added, but the counter doesn't yet reflect it (or vice versa), and you can get bugs if you trust it (e.g. you may try to add to a full queue).
In short - you can have each element consistent with itself, but not consistent with each other.
Compare and swap is useful, but there is an even simpler (so called 'lock-free') technique that is useful in certain producer/consumer use cases that might be useful so I will mention it.
Imagine you have a function doWork() that writes to a buffer.
This only works because A only reads and B only writes, but this use case is fairly common for 'background worker' threads. This will only sure to work on Java or C# where volatile comes with the guarantees.
Here are some general approaches that can minimize the use of locks, assuming your algorithm has some particular exploitable features:
When updating a single numeric variable, you can use non-blocking primitives such as CAS, atomic_increment, etc. They are usually much faster that a classic blocking critical section (lock, mutex).
When a data structure is read by multiple threads, but only written by one or few threads, an obvious solution would be a read-write lock, instead of a full lock.
Try to exploit fine grain locking. For example, instead of locking an entire data structure with a single lock, see if you can use multiple different locks to protect distinct sections of the data structure.
If you're relying on the implicit memory fence effect of locks to ensure visibility of a single variable across threads, just use volatile
1, if available.
Sometimes, using a conditional variable (and associated lock) is too slow in practice. In this case, a volatile
busy spin is much more efficient.
More good advice on this topic here: http://software.intel.com/en-us/articles/intel-guide-for-developing-multithreaded-applications/
A nice read in another SO question: Lock-free multi-threading is for real threading experts (don't be scared by the title).
And a recently discussed lock-free Java implementation of atomic_decrement: Starvation in non-blocking approaches
1 The use of volatile
here applies to languages such as Java where volatile
has defined semantics in the memory model, but not to C or C++ where volatile
preceded the introduction of the cross-thread memory model and doesn't integrate with it. Similar constructs are available in those languages, such as the various std::memory_order specifiers in C++.