What are some ideas for cross-modifying code that could trigger unexpected behavior on x86 or x86-x64 systems, where everything is done correctly in the cross
Think of a processor that has a very long instruction pipeline where registers and memory are only modified in the last pipeline stage. When you write self modifying code for this processor and modify an instruction in memory that is already present in the pipeline, the modification will have no effect. In this case the behaviour of the program depends on how long the pipeline of the processor is.
To make new processors with longer pipelines behave exactly as older models, Intel processors include a mechanism that flushes (empties) the pipeline if this case is detected. After the flush, the modified code is fetched into the pipeline, so the new processor behaves exactly as old ones.
A serializing instruction is another way to flush the pipeline. When it reaches the end of the pipeline, the pipeline is flushed and starts fetching again after the serializing instruction.
So what the errata is essentially saying is that some processor models do not check if writes from other processors overwrite instructions that are already executing in their pipeline. The check works only for local writes, not for external writes. But if you insert a serializing instruction you force the processor to flush the pipeline and everything will behave as expected.
To reproduce the behaviour described in the errata you need to make sure that the code you are modifying from one processor is inside the pipeline of the other processor. Take a look at branch prediction (decides which code path is inside the pipeline) and synchronization primitives.