There is an illustration in kernel source Documentation/memory-barriers.txt, like this:
CPU 1 CPU 2 ======================
The key missing point is the mistaken assumption that for the sequence:
LOAD C (gets &B)
LOAD *C (reads B)
the first load has to precede the second load. A weakly ordered architectures can act "as if" the following happened:
LOAD B (reads B)
LOAD C (reads &B)
if( C!=&B )
LOAD *C
else
Congratulate self on having already loaded *C
The speculative "LOAD B" can happen, for example, because B was on the same cache line as some other variable of earlier interest or hardware prefetching grabbed it.