Why are there so many implementations of Object Pooling in Roslyn?

前端 未结 1 1710
野的像风
野的像风 2021-01-31 15:26

The ObjectPool is a type used in the Roslyn C# compiler to reuse frequently used objects which would normally get new\'ed up and garbage collected very often. This reduces the a

相关标签:
1条回答
  • 2021-01-31 16:05

    I'm the lead for the Roslyn performance v-team. All object pools are designed to reduce the allocation rate and, therefore, the frequency of garbage collections. This comes at the expense of adding long-lived (gen 2) objects. This helps compiler throughput slightly but the major effect is on Visual Studio responsiveness when using the VB or C# IntelliSense.

    why there are so many implementations".

    There's no quick answer, but I can think of three reasons:

    1. Each implementation serves a slightly different purpose and they are tuned for that purpose.
    2. "Layering" - All the pools are internal and internal details from the Compiler layer may not be referenced from the Workspace layer or vice versa. We do have some code sharing via linked files, but we try to keep it to a minimum.
    3. No great effort has gone into unifying the implementations you see today.

    what the preferred implementation is

    ObjectPool<T> is the preferred implementation and what the majority of code uses. Note that ObjectPool<T> is used by ArrayBuilder<T>.GetInstance() and that's probably the largest user of pooled objects in Roslyn. Because ObjectPool<T> is so heavily used, this is one of the cases where we duplicated code across the layers via linked files. ObjectPool<T> is tuned for maximum throughput.

    At the workspace layer, you'll see that SharedPool<T> tries to share pooled instances across disjoint components to reduce overall memory usage. We were trying to avoid having each component create its own pool dedicated to a specific purpose and, instead share based on the type of element. A good example of this is the StringBuilderPool.

    why they picked a pool size of 20, 100 or 128.

    Usually, this is the result of profiling and instrumentation under typical workloads. We usually have to strike a balance between allocation rate ("misses" in the pool) and the total live bytes in the pool. The two factors at play are:

    1. The maximum degree of parallelism (concurrent threads accessing the pool)
    2. The access pattern including overlapped allocations and nested allocations.

    In the grand scheme of things, the memory held by objects in the pool is very small compared to the total live memory (size of the Gen 2 heap) for a compilation but, we do also take care not to return giant objects (typically large collections) back to the pool - we'll just drop them on the floor with a call to ForgetTrackedObject

    For the future, I think one area we can improve is to have pools of byte arrays (buffers) with constrained lengths. This will help, in particular, the MemoryStream implementation in the emit phase (PEWriter) of the compiler. These MemoryStreams require contiguous byte arrays for fast writing but they are dynamically sized. That means they occasionally need to resize - usually doubling in size each time. Each resize is a new allocation, but it would be nice to be able to grab a resized buffer from a dedicated pool and return the smaller buffer back to a different pool. So, for example, you would have a pool for 64-byte buffers, another for 128-byte buffers and so on. The total pool memory would be constrained, but you avoid "churning" the GC heap as buffers grow.

    Thanks again for the question.

    Paul Harrington.

    0 讨论(0)
提交回复
热议问题