From a software point of view, what is the latency between an instruction that dirties a memory page and when the core actually marks the page dirty in the Page Table Entry (PTE
AMD64 Architecture Programmer’s Manual Volume 2: System Programming (revision 3.22, Sept. 2012)
In general, Dirty bit updates are ordered with respect to other loads and stores, although not necessarily with respect to accesses to WC memory; in particular, they may not cause WC buffers to be flushed. However, to ensure compatibility with future processors, a serializing operation should be inserted before reading the D bit.
(Emphasis mine.)