In terms of actual low level atomic instructions and memory fences (I assume they\'re used), how do you implement STM?
The part that\'s mysterious to me is that given s
GHC's STM implementation is described in section six of:
Composable Memory Transactions. Tim Harris, Simon Marlow, Simon Peyton Jones, Maurice Herlihy. PPoPP'05: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Chicago, Illinois, June 2005
And section five of:
Transactional memory with data invariants. Tim Harris, Simon Peyton-Jones. March 2006 TRANSACT '06
If you are going with .NET framework,
You can check out this experimental
I suggest you watch this presentation: http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
In the second half it explains how to update values without leaving them in an undefined state. For example - if you have a tree that you want to update in STM-style you don't change the previous version at all. Let's say that tree
is a pointer to the root of the tree. The only thing you create is the nodes that changed (but they can refer to nodes in the original snapshot of the tree.
Then you do a compare-and-swap on the tree
pointer. If it succeeded, then everyone will now see your new tree and the old one can be garbage-collected. If it hasn't, then you repeat the process and the tree you just constructed is garbage collected.
The big idea is that you don't need to detect if anyone else changed the tree
until you actually swap the new and old values, so there are no "conflicts" or "clobbered values" from the typical multithreaded programming.
The simplest answer is "it depends". There are tons of radically different implementations working in pretty much any way imaginable.
The part that's mysterious to me is that given some arbitrary chunk of code, you need a way to go back afterward and determine if the values used in each step were valid. How do you do that, and how do you do it efficiently?
One solution is to use versioning. Every time an object is modified, its version number is updated. While the transaction is running, you validate each accessed object's version, and when the transaction commits, you verify that the objects are still valid.
This validation can be a simple integer comparison (if transaction_start_version >= object_version
, the object is valid), so it can be done fairly efficiently.
This would also seem to suggest that just like any other 'locking' solution you want to keep your critical sections as small as possible (to decrease the probability of a conflict), am I right?
Very likely. I think a few implementations have gone the route of assuming/requiring everything to be a transaction, but yes, in most implementations, transactions are specially marked chunks of code, and the longer a transaction runs, the larger the chance of a conflict that may cause transactions to roll back.
Also, can STM simply detect "another thread entered this area while the computation was executing, therefore the computation is invalid" or can it actually detect whether clobbered values were used (and thus by luck sometimes two threads may execute the same critical section simultaneously without need for rollback)?
The latter. Keep in mind that the idea in TM is to protect data, rather than code.
Different code paths may access the same variable in different transactions. This has to be detected by the TM system. There's no real notion of "this area", since that refers to code, rather than data. The TM system doesn't care what code is executing, it tracks what data is being modified. In that way, it differs entirely from critical sections (which protect code, rather than data)
Some papers can be really difficult to read but two which are very clear and concise are:
Transactional Locking II, Dave Dice, Ori Shalev, Nir Shavit which describes the "TL2" STM algorithm in terms of any language.
Deuce: Noninvasive Software Transactional Memory in Java, Guy Korland, Nir Shavit, Pascal Felber which explains a load time class transformer which transforms regular java classes into in-memory classes which have additional bytecode to do STM. This is relevant to the question as the paper explains how code without STM can be mechanically transformed into code which is doing STM in any OO language.
The Deuce framework lets you plugin the actual algorithm you wish to use; including the TL2 algorithm. As the Deuce framework is Java the following discussion uses java terminology but is only assuming that you are writing in an OO language.
Below will outline the TL2 approach. The papers have links to alternative approaches but studying one algorithm answers a lot of questions.
how do you implement STM? The part that's mysterious to me is that given some arbitrary chunk of code, you need a way to go back afterward and determine if the values used in each step were valid.
One short answer for how TL2 does STM is "bookkeeping" and then only using write locks at commit time. Read the paper for the fine detail but a board brush outline is as follows. Every property that you can use in the original code would have a getter and setter. In the transformed code there would also be a version number of the property and additional code added to the getter and setter. You need to record the version of every attribute you read within the transaction as the transaction "read-set". You can do this by having every getter add the version of the attribute seen into a threadlocal linkedlist. You also need to buffer the writes as the "write-set" in a threadlocal until you commit. Note that the getter methods need to check and return a threadlocal write-set value for a given field if you have one. That way you see your uncommitted writes in your reads but no other thread is going to see them until you commit.
At commit time you take write locks against every attribute you are about to write. Whilst you have the locks you double check that your read-set is still valid; that the attributes you read in your transaction have not been updated to a higher version by another transaction. If so then your business logic may not be valid as you can have inconsistent reads so you need to rollback the whole transaction. If the final checks pass then you commit by flushing your write-set, bump the versions of those attributes, release the write locks, and final clear both the write-set and read-set threadlocal lists.
The paper explains that the algorithm can abort early if it detects that an attribute being read has been written to since the tx started. The paper has some neat tricks to speed up read-only transactions. It even has a trick to work out which blocks are read-only and which are read-write. Anyone expressing an interest in such things really should enjoy the two papers.
The Deuce framework in the paper above shows how to change all your getters and setters by injecting new java bytecode into your classes at load time. Other languages could have a special compiler or preprocessor which perform the same mechanical transformation of normal code into STM enabled code. Specifically with Deuce your source code objects can have simple getter setter pairs but transformed classes at runtime have enriched getter setters which do the bookwork.
Transforming regular code into STM code (particularly at runtime) is interesting but if you need to actually write a STM enabled data structure you don't need any magic sauce. Instead just create a Ref
class with a get()
and a set(x)
and make every single relationship between your data structure objects made up of Ref
handles. In the get
and set
of your Ref
class you can do the threadlocal bookwork. Then you can implement a simple version of "TL2" or any other algorithm which works well for your data structures and your concurrency of read versus write.
This would also seem to suggest that just like any other 'locking' solution you want to keep your critical sections as small as possible (to decrease the probability of a conflict), am I right?
TL2 has a critical period in holding the write locks then doing final checks and writes which is easy to comprehend and optimise without any understanding of the application business logic. If you assign each number a unique property you can trivially avoid deadlock by taking locks in ascending order. It is important to note that all your business logic is done speculatively assuming that the commit checks will pass. You don't hold locks whilst you are doing arbitrary slow business logic. You can be doing multiple web service lookups or slow database calls and you won't take any locks until the commit. Clearly the professionals are going to tune the heck out of the generic critical section.
The paper makes it clear that the algorithm may be aborting more often that the specific business logic requires. The generic algorithm does not know whether specific dirty reads would not affect the actual write outcome. Handwritten logic which understands the actual business logic could know special cases when a rollback is not needed for a given sets of dirty reads. If however you have lots of code to write and an application where the likelihood of rollback is very low a generic mechanical STM approach may lead to less bugs and perform well.
Also, can STM simply detect "another thread entered this area while the computation was executing, therefore the computation is invalid" or can it actually detect whether clobbered values were used (and thus by luck sometimes two threads may execute the same critical section simultaneously without need for rollback)?
The TL2 approach as all about the data read or written not about the code which does it. It is what you get and set and which counts; and whether any other thread trod on your toes before you flush all the writes. All that is required of the code is that you have a begin()
, commit()
and rollback()
in the business logic to start, end and abort the transaction. Even that can be generated code. With Java you could mark your methods with the @Transactional annotation on methods then generate code which wrap your method invocations in a try/catch/finally that does the begin/commit/rollback as idiomic java. Deuce injects such logic at class load time.
Once again you don't need such magic sauce to begin/commit/rollback in your own STM enabled data structures. You can be explicit and put in all that right into your data structure logic code to create your own explicitly STM enabled classes in any OO language.