Volatile vs. Interlocked vs. lock

后端 未结 9 1596
悲哀的现实
悲哀的现实 2020-11-22 05:54

Let\'s say that a class has a public int counter field that is accessed by multiple threads. This int is only incremented or decremented.

T

9条回答
  •  遇见更好的自我
    2020-11-22 06:22

    Either lock or interlocked increment is what you are looking for.

    Volatile is definitely not what you're after - it simply tells the compiler to treat the variable as always changing even if the current code path allows the compiler to optimize a read from memory otherwise.

    e.g.

    while (m_Var)
    { }
    

    if m_Var is set to false in another thread but it's not declared as volatile, the compiler is free to make it an infinite loop (but doesn't mean it always will) by making it check against a CPU register (e.g. EAX because that was what m_Var was fetched into from the very beginning) instead of issuing another read to the memory location of m_Var (this may be cached - we don't know and don't care and that's the point of cache coherency of x86/x64). All the posts earlier by others who mentioned instruction reordering simply show they don't understand x86/x64 architectures. Volatile does not issue read/write barriers as implied by the earlier posts saying 'it prevents reordering'. In fact, thanks again to MESI protocol, we are guaranteed the result we read is always the same across CPUs regardless of whether the actual results have been retired to physical memory or simply reside in the local CPU's cache. I won't go too far into the details of this but rest assured that if this goes wrong, Intel/AMD would likely issue a processor recall! This also means that we do not have to care about out of order execution etc. Results are always guaranteed to retire in order - otherwise we are stuffed!

    With Interlocked Increment, the processor needs to go out, fetch the value from the address given, then increment and write it back -- all that while having exclusive ownership of the entire cache line (lock xadd) to make sure no other processors can modify its value.

    With volatile, you'll still end up with just 1 instruction (assuming the JIT is efficient as it should) - inc dword ptr [m_Var]. However, the processor (cpuA) doesn't ask for exclusive ownership of the cache line while doing all it did with the interlocked version. As you can imagine, this means other processors could write an updated value back to m_Var after it's been read by cpuA. So instead of now having incremented the value twice, you end up with just once.

    Hope this clears up the issue.

    For more info, see 'Understand the Impact of Low-Lock Techniques in Multithreaded Apps' - http://msdn.microsoft.com/en-au/magazine/cc163715.aspx

    p.s. What prompted this very late reply? All the replies were so blatantly incorrect (especially the one marked as answer) in their explanation I just had to clear it up for anyone else reading this. shrugs

    p.p.s. I'm assuming that the target is x86/x64 and not IA64 (it has a different memory model). Note that Microsoft's ECMA specs is screwed up in that it specifies the weakest memory model instead of the strongest one (it's always better to specify against the strongest memory model so it is consistent across platforms - otherwise code that would run 24-7 on x86/x64 may not run at all on IA64 although Intel has implemented similarly strong memory model for IA64) - Microsoft admitted this themselves - http://blogs.msdn.com/b/cbrumme/archive/2003/05/17/51445.aspx.

提交回复
热议问题