How are the modern Intel CPU L3 caches organized?

前端 未结 2 632
隐瞒了意图╮
隐瞒了意图╮ 2021-01-03 12:23

Given that CPUs are now multi-core and have their own L1/L2 caches, I was curious as to how the L3 cache is organized given that its shared by multiple cores. I would imagin

相关标签:
2条回答
  • 2021-01-03 12:30

    There is single (sliced) L3 cache in single-socket chip, and several L2 caches (one per real physical core). L3 cache caches data in segments of size of 64 bytes (cache lines), and there is special Cache coherence protocol between L3 and different L2/L1 (and between several chips in the NUMA/ccNUMA multi-socket systems too); it tracks which cache line is actual, which is shared between several caches, which is just modified (and should be invalidated from other caches). Some of protocols (cache line possible states and state translation): https://en.wikipedia.org/wiki/MESI_protocol, https://en.wikipedia.org/wiki/MESIF_protocol, https://en.wikipedia.org/wiki/MOESI_protocol

    In older chips (epoch of Core 2) cache coherence was snooped on shared bus, now it is checked with help of directory.

    In real life L3 is not just "single" but sliced into several slices, each of them having high-speed access ports. There is some method of selecting the slice based on physical address, which allow multicore system to do many accesses at every moment (each access will be directed by undocumented method to some slice; when two cores uses same physical address, their accesses will be served by same slice or by slices which will do cache coherence protocol checks). Information about L3 cache slices was reversed in several papers:

    • https://cmaurice.fr/pdf/raid15_maurice.pdf Reverse Engineering Intel Last-Level Cache Complex Addressing Using Performance Counters
    • https://eprint.iacr.org/2015/690.pdf Systematic Reverse Engineering of Cache Slice Selection in Intel Processors
    • https://arxiv.org/pdf/1508.03767.pdf Cracking Intel Sandy Bridge’s Cache Hash Function

    With recent chips programmer has ability to partition the L3 cache between applications "Cache Allocation Technology" (v4 Family): https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology https://software.intel.com/en-us/articles/introduction-to-code-and-data-prioritization-with-usage-models https://danluu.com/intel-cat/ https://lwn.net/Articles/659161/

    0 讨论(0)
  • 2021-01-03 12:38

    Modern Intel L3 caches (since Nehalem) use a 64B line size, the same as L1/L2. They're shared, and inclusive.

    See also http://www.realworldtech.com/nehalem/2/

    Since SnB at least, each core has part of the L3, and they're on a ring bus. So in big Xeons, L3 size scales linearly with number of cores.


    See also Which cache mapping technique is used in intel core i7 processor? where I wrote a much larger and more complete answer.

    0 讨论(0)
提交回复
热议问题