Performance Impact of Nested Vectors vs. Contiguous Arrays

耗尽温柔 提交于 2021-02-08 14:40:11

问题


Have there been any reliable tests that clearly display the performance differences between accessing and writing to nested vectors versus C++'s built-in arrays? I've heard that using nested (multi-dimensional) vectors typically have some performance overhead as compared to accessing elements in a single array (where all elements are stored in contiguous memory), but this just all seems to be hypothetical to me. I have yet to see any tests that actually show these differences. Are they significant? I'm sure that it depends on the scenario, but as an inexperienced programmer, I'm not quite sure at what level these differences do become significant.


回答1:


It definitely depends on the scenario, to the extent that I don't think it's possible to answer in a general way which approach is fastest. The fastest approach is going to be the one where the access patterns have the best data locality - which depends highly on the access pattern as well as how the structures are laid out in memory, which in the case of nested vectors is dependent on the allocator and probably varies quite a bit between compilers.

I'd follow the general rule of optimization, which is to first write things in the most straightforward way and then attempt optimization when you can prove there is a bottleneck.




回答2:


Two things contribute to the runtime differences between nested and flattened arrays:
Caching behaviour and indirection

  • CPUs use a hierarchy of Caches to avoid accessing the RAM directly too frequently. This exploits the fact that most memory accesses are either contiguous or have a certain temporal locality, i.e. what was accessed recently will be accessed again soon.
    This means that if the innermost arrays of your nested array is rather large, you will notice small to no differences to a flat array if you iterate over the values in a contiguous fashion. This means that when iterating over a range of values, for flat arrays, your innermost loop should iterate over consecutive elements, for nested arrays, your innermost loop should iterate over the innermost array.
  • If however your access patterns are random, the most important difference in timing are indirections:
    For a flat array, you use something like A[(Z * M + Y) * N + X], so you do 4 arithmetic operations and then a memory access.
    For a nested array, you use A[Z][Y][X], so there are actually three interdependent memory accesses: You need to know A[Z] before you can access A[Z][Y] and so on. Because of the superscalar architecture of modern CPUs, operations that can be executed in parallel are especially efficient, interdependent operations not so much. So you have some arithmetic operations and a memory load on the one side and three interdependent loads on the other side, which is significantly slower. It might be possible that for nested arrays, the contents of A and also A[Z] for some values of Z can be found in the cache hierarchy, but if your nested array is sufficiently large, it will never fit completely into the cache, thus leading to multiple cache misses and memory loads (nested) instead of just a single cache miss and load (flat) for a single random access into the array.

Also see his question, especially the shorter answers below for a more detailed discussion of caching (my answer) and indirection (Peter's answer), which also provides an example where there are no noticeable differences between nested and flat arrays (after fixing the indexing bug of course ;) )

So if you want to know if there are significant runtime differences between them, my answer would be:

  • If you do random access, you will definitely notice the multiple indirections, thus leading to a larger runtime for nested arrays.

  • If you do contiguous access and use the correct ordering of the loops (innermost loop = innermost array/innermost index for flat array) and your innermost dimension of the multi-dimensional array is large enough, then the difference will be negligable, since the compiler will be able to move all indirections out of the innermost loop.



来源:https://stackoverflow.com/questions/45747848/performance-impact-of-nested-vectors-vs-contiguous-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!