I recently stumbled upon this Wikipedia article. From my experience with multi-threading I am aware of the multitude of issues caused by the program being able to switch threads
You should not be running things that need to happen in order in different threads. Threads are for processing things in parallel, so if the order is important, it needs to be done serially.
I will address your question as one about multithreading in a high-level language, rather than discussing CPU pipeline optimization.
Can anyone explain how to correctly deal with the possibility of reordered operations in a multi-threaded environment?
Most, if not all, modern high-level multithreaded languages provide constructs for managing this potential for the compiler to reorder the logical execution of instructions. In C#, these include field-level constructs (volatile
modifier), block-level constructs (lock
keyword), and imperative constructs (Thead.MemoryBarrier
).
Applying volatile
to a field causes all access to that field in the CPU/memory to be executed in the same relative order in which it occurs in the instruction sequence (source code).
Using lock
around a block of code causes the enclosed instruction sequence to be executed in the same relative order in which it occurs in the parent block of code.
The Thread.MemoryBarrier
method indicates to the compiler that the CPU must not reorder memory access around this point in the instruction sequence. This enables a more advanced technique for specialized requirements.
The techniques above are described in order of increasing complexity and performance. As with all concurrency programming, determining when and where to apply these techniques is the challenge. When synchronizing access to a single field, the volatile
keyword will work, but it could prove to be overkill. Sometimes you only need to synchronize writes (in which case a ReaderWriterLockSlim
would accomplish the same thing with much better performance). Sometimes you need to manipulate the field multiple times in quick succession, or you must check a field and conditionally manipulate it. In these cases, the lock
keyword is a better idea. Sometimes you have multiple threads manipulating shared state in a very loosely-synchronized model to improve performance (not typically recommended). In that case, carefully placed memory barriers can prevent stale and inconsistent data from being used in threads.
How do you prevent the possibility out of execution functions occurring and blowing up in your face?
You don't - the compiler can only change the order of execution when doing so doesn't alter the end result.
So essentially you're asking about the memory consistency model. Some languages/environments, such as Java and .NET, define a memory model, and it's the responsibility of the programmer to not do things which are not allowed, or results in undefined behavior. If you're unsure about the atomicity behavior of "normal" operations, it's better to be safe than sorry and just use the mutex primitives.
For C and C++ the situation is not as nice, as those language standards don't define a memory model. And no, contrary to the unfortunately popular opinion, volatile doesn't guarantee anything wrt atomicity. In this case, you have to rely on the platform threads library (which among other things executes required memory barriers) or compiler/hw-specific atomic intrinsics, and hope that the compiler doesn't do any optimizations that break program semantics. As long as you avoid conditional locking within a function (or translation unit if using IPA) you ought to be relatively safe.
Luckily, C++0x and the next C standard are rectifying this issue by defining a memory model. I asked a question related to this and, as it turned out, conditional locking here; the question contains links to some documents that go into the issue in some detail. I recommend you to read those documents.
The factor of the matter is that if you're only just starting to deal with multithreaded code (to the point that you're explicitly talking about thread scheduling as if it's somewhat scary [not to say it isn't, but for different reasons]), this is happening at a much, much lower level than you need to worry about. As others have said, compilers will not do this if it cannot guarantee correctness, and while it's good to know that technologies like this exist, unless you're writing your own compiler or doing really bare metal stuff, it shouldn't present an issue.
It's not the compiler, it's the CPU. (Well both actually, but the CPU is the harder to control.) Regardless of how your code gets compiled, the CPU will look-ahead in the instruction stream and execute things out of order. Typically, for example, start a read early, since memory is slower than the CPU. (ie start it early, hope the read is done before you actually need it)
Both the CPU and the compiler optimize based on the same rule: reorder anything as long as it doesn't affect the results of the program * assuming a single-threaded single-processor environment *.
So there's the problem - it optimizes for single-threadedness, when it isn't. Why? Because otherwise everything would be 100x slower. really. And most of your code is singlethreaded (ie single-threaded-interaction) - only small parts need to interact in a multi-threaded way.
The best/easiest/safest way to control this is with locks - mutexes, semaphores, events, etc.
Only if you really, really, need to optimize (based on careful measurement), then you can look into memory barriers and atomic operations - these are the underlying instructions that are used to build mutexes etc, and when used correctly limit out-of-order execution.
But before doing that kind of optimization, check that the algorithms and code-flow are correct and whether you could further minimize multi-threaded interactions.