in linq why are subsequent calls of IEnumerable.Intersect so much faster

后端 未结 5 1266
时光说笑
时光说笑 2021-01-19 18:10

while looking at this question C# Similarities of two arrays it was noted that the initial linq call was significantly slower than subsequent calls. What is being cached th

相关标签:
5条回答
  • 2021-01-19 18:53

    LINQ heavily uses deferred execution. Unless you enumerate a query, it does not get executed.

    Change

     s.Start();
     z= a.Intersect(b);
     s.Stop();
    

    to

     s.Start();
     z= a.Intersect(b).**ToArray**();
     s.Stop();
    

    and please post the new performance results.

    a.Intersect(b) represents an expression, independent of the value of a and b. Values of a and b are only used when the expression is evaluated by enumeration.

    0 讨论(0)
  • 2021-01-19 18:53

    You are enumerating the result of Intersect() only when you call Count(); that's when the calculation of the intersection actually occurs. The part you're timing is the creation of the enumerable object that represents the future calculation of the intersection.

    In addition to the jitting penalty others have noted, the first call to Intersect() might be the first use of a type from System.Core.dll, so you might be looking at the time required to load the IL code into memory, as well.

    0 讨论(0)
  • 2021-01-19 19:02

    I would expect the first run of any loop to be slower for three reasons:

    1. Code has to be jitted the first time, but not subsequently.
    2. If the executable code run is small enough to fit in cache, then it won't have been evicted, and be faster for the CPU to load.
    3. If the data is small enough to fit in cache, then it won't have been evicted, and be faster for CPU to load.
    0 讨论(0)
  • 2021-01-19 19:07

    Enumerable.Intersect does not do any caching. It is implemented using a HashSet. The first sequence is added to the HashSet. Then the second sequence is removed from the HashSet. The remaining elements in the HashSet is yielded as an enumerable sequence of elements. You will have to actually enumerate the HashSet to pay the cost of creating the HashSet. This implementation is suprisingly efficient even for small collections.

    If you see a difference in performance in subsequent calls it is not because Enumerable.Intersect does any caching but probably because you need to "warm up" your benchmark.

    0 讨论(0)
  • 2021-01-19 19:08

    JITting System.Enumerable.

    Put new List().Intersect(new List()); new System.Diagnostics.Stopwatch().Stop(); as your first line of code and all interations will take the same amount of time.

    0 讨论(0)
提交回复
热议问题