I\'m reading Joe Duffy\'s post about Volatile reads and writes, and timeliness, and i\'m trying to understand something about the last code sample in the post:
<
According to ECMA-335 (section I.12.6.5):
5. Explicit atomic operations. The class library provides a variety of atomic operations in the System.Threading.Interlocked class. These operations (e.g., Increment, Decrement, Exchange, and CompareExchange) perform implicit acquire/release operations.
So, these operations follow principle of least astonishment.
Any x86 instruction that has lock prefix has full memory barrier. As shown Abel's answer, Interlocked* APIs and CompareExchanges use lock-prefixed instruction such as lock cmpxchg
. So, it implies memory fence.
Yes, Interlocked.CompareExchange uses a memory barrier.
Why? Because x86 processors did so. From Intel's Volume 3A: System Programming Guide Part 1, Section 7.1.2.2:
For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete). This rule is also true for the Pentium 4 and Intel Xeon processors, with one exception. Load operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized.
volatile
has nothing to do with this discussion. This is about atomic operations; to support atomic operations in CPU, x86 guarantees all previous loads and stores to be completed.
The interlocked functions are guaranteed to stall the bus and the cpu while it resolves the operands. The immediate consequence is that no thread switch, on your cpu or another one, will interrupt the interlocked function in the middle of its execution.
Since you're passing a reference to the c# function, the underlying assembler code will work with the address of the actual integer, so the variable access won't be optimized away. It will work exactly as expected.
edit: Here's a link that explains the behaviour of the asm instruction better: http://faydoc.tripod.com/cpu/cmpxchg.htm
As you can see, the bus is stalled by forcing a write cycle, so any other "threads" (read: other cpu cores) that would try to use the bus at the same time would be put in a waiting queue.
There seems to be some comparison with the Win32 API functions by the same name, but this thread is all about the C# Interlocked
class. From its very description, it is guaranteed that its operations are atomic. I'm not sure how that translates to "full memory barriers" as mentioned in other answers here, but judge for yourself.
On uniprocessor systems, nothing special happens, there's just a single instruction:
FASTCALL_FUNC CompareExchangeUP,12
_ASSERT_ALIGNED_4_X86 ecx
mov eax, [esp+4] ; Comparand
cmpxchg [ecx], edx
retn 4 ; result in EAX
FASTCALL_ENDFUNC CompareExchangeUP
But on multiprocessor systems, a hardware lock is used to prevent other cores to access the data at the same time:
FASTCALL_FUNC CompareExchangeMP,12
_ASSERT_ALIGNED_4_X86 ecx
mov eax, [esp+4] ; Comparand
lock cmpxchg [ecx], edx
retn 4 ; result in EAX
FASTCALL_ENDFUNC CompareExchangeMP
An interesting read with here and there some wrong conclusions, but all-in-all excellent on the subject is this blog post on CompareExchange.
As often, the answer is, "it depends". It appears that prior to 2.1, the ARM had a half-barrier. For the 2.1 release, this behavior was changed to a full barrier for the Interlocked
operations.
The current code can be found here and actual implementation of CompareExchange here. Discussions on the generated ARM assembly, as well as examples on generated code can be seen in the aforementioned PR.
ref
doesn't respect the usual volatile
rules, especially in things like:
volatile bool myField;
...
RunMethod(ref myField);
...
void RunMethod(ref bool isDone) {
while(!isDone) {} // silly example
}
Here, RunMethod
is not guaranteed to spot external changes to isDone
even though the underlying field (myField
) is volatile
; RunMethod
doesn't know about it, so doesn't have the right code.
However! This should be a non-issue:
Interlocked
, then use Interlocked
for all access to the fieldlock
, then use lock
for all access to the fieldFollow those rules and it should work OK.
Re the edit; yes, that behaviour is a critical part of Interlocked
. To be honest, I don't know how it is implemented (memory barrier, etc - note they are "InternalCall" methods, so I can't check ;-p) - but yes: updates from one thread will be immediately visible to all others as long as they use the Interlocked
methods (hence my point above).
MSDN says about the Win32 API functions: "Most of the interlocked functions provide full memory barriers on all Windows platforms"
(the exceptions are Interlocked functions with explicit Acquire / Release semantics)
From that I would conclude that the C# runtime's Interlocked makes the same guarantees, as they are documented withotherwise identical behavior (and they resolve to intrinsic CPU statements on the platforms i know). Unfortunately, with MSDN's tendency to put up samples instead of documentation, it isn't spelled out explicitly.