Is prefetching triggered by the stream of exact addresses or by the stream of cache lines?

后端 未结 1 882
清歌不尽
清歌不尽 2021-02-15 22:46

On modern x86 CPUs, hardware prefetching is an important technique to bring cache lines into various levels of the cache hierarchy before they are explicitly requested by the us

相关标签:
1条回答
  • 2021-02-15 23:24

    The cache line offsets can be useful but they also can be misleading as your example shows. I will discuss the how line offsets impact the data prefetchers on modern Intel processors based on my experiments on Haswell.

    The method I followed is simple. First, I disable all the data prefetchers except the one I want to test. Second, I design a sequence of accesses that exhibit a particular pattern of interest. The target prefetcher will see this sequence and learn from it. Then I follow that by an access to a particular line to determine whether the prefetcher has prefetched that line or not by accurately measuring the latency. The loop doesn't contain any other loads. It contains though one store used to store the latency measurement in some buffer.

    There are 4 hardware data prefetchers. The behaviors of the DCU prefetcher and the L2 adjacent line prefetcher are not affected by the pattern of the line offsets, but only by the pattern of 64-byte aligned addresses.

    My experiments don't show any evidence that the L2 streaming prefetcher even receives the cache line offset. It seems that it only gets the line-aligned address. For example, by accessing the same line multiple times, the offset pattern by itself does not seem to have an impact on the behavior of the prefetcher.

    The DCU IP prefetcher shows interesting behavior. I've tested two cases:

    • If a load has decreasing offsets, the prefetcher will prefetch one or more lines both in the forward and backward direction.
    • If a load has increasing offsets, the prefetcher will prefetch one or more lines but only in the forward direction.
    0 讨论(0)
提交回复
热议问题