Multiple accesses to main memory and out-of-order execution

前端 未结 1 1924
忘了有多久
忘了有多久 2020-12-22 12:08

Let us assume that I have two pointers that are pointing to unrelated addresses that are not cached, so they will both have to come all the way from main memory when being d

相关标签:
1条回答
  • 2020-12-22 12:25

    Modern CPUs have multiple load buffers so multiple loads can be outstanding at the same time. The memory subsystem is heavily pipelined, giving many parts of it much better throughput than latency. (e.g. with prefetching, Haswell can sustain (from main memory) an 8B load every 1 clock. But the latency if the address isn't known ahead of time is in the hundreds of cycles).

    So yes, a Haswell core can keep track of up to 72 outstanding load uops waiting for data from cache / memory. (This is per-core. The shared L3 cache also needs some buffers to handle the whole system's loads / stores to DRAM and memory-mapped IO.)

    Haswell's ReOrder Buffer size is 192 uops, so up to 190 uops of work in the code that does not use a or b can be issued and executed while the loads of a and b are the oldest instructions that haven't retired. Instructions / uops are retired in-order to support precise exceptions. The ROB size is basically the limit of the out-of-order window for hiding latency of slow operations like cache-misses.

    Also see other links at the x86 tag wiki to learn how CPUs work. Agner Fog's microarch guide is great for having a mental model of the CPU pipeline to let you understand approximately how code will execute.

    From David Kanter's Haswell writeup:

    0 讨论(0)
提交回复
热议问题