Array.Count() much slower than List.Count()

前端 未结 4 1880
梦如初夏
梦如初夏 2021-02-01 18:00

When using the extension method of IEnumerable Count(), an array is at least two times slower than a list.

Function                      Co         


        
相关标签:
4条回答
  • 2021-02-01 18:26

    That is because int[] requires casting, while List<int> does not. If you were to use Length property then result will be quite different - approx. 10x faster than List<int>.Count().

    0 讨论(0)
  • 2021-02-01 18:27

    The reason is that Enumerable.Count<T>() performs a cast to ICollection<T> to retrieve the count both from the list and the array.

    Using this sample code:

    public static int Count<TSource>(IEnumerable<TSource> source)
    {
        ICollection<TSource> collection = source as ICollection<TSource>;
        if (collection != null)
        {
            return 1; // collection.Count;
        }
    }
    

    you can determine that the cast takes much longer for the array, in fact most of the time taken for counting is from this cast:

    Function                      Count()
    List<int>                     1,575
    int[]                         5,069
    

    The key might be this statement from the documentation (emphasis mine):

    In the .NET Framework version 2.0, the Array class implements the System.Collections.Generic.IList, System.Collections.Generic.ICollection, and System.Collections.Generic.IEnumerable generic interfaces. The implementations are provided to arrays at run time, and therefore are not visible to the documentation build tools. As a result, the generic interfaces do not appear in the declaration syntax for the Array class, and there are no reference topics for interface members that are accessible only by casting an array to the generic interface type (explicit interface implementations).

    0 讨论(0)
  • 2021-02-01 18:30

    32-bit profiling analysis (all in ms, only interesting bits, JIT inlining disabled):

    Name    Count   'Inc Time'  'Ex Time'   'Avg Inc Time'  'Avg Ex Time'
    System.Linq.Enumerable::Count(<UNKNOWN>):int32 <System.Int32>   
            20000000    13338.38    7830.49 0.0007  0.0004
    System.SZArrayHelper::get_Count():int32 <System.Int32>  
            10000000    4063.9      2651.44 0.0004  0.0003
    System.Collections.Generic.List<System.Int32>::get_Count():int32    
            10000000    1443.99     1443.99 0.0001  0.0001
    System.Runtime.CompilerServices.JitHelpers::UnsafeCast(Object):System.__Canon <System.__Canon>  
            10000004    1412.46     1412.46 0.0001  0.0001
    

    System.SZArrayHelper::get_Count() seems to call System.Runtime.CompilerServices.JitHelpers::UnsafeCast for the array case.

    For the list, List<int>.Count simply returns the size.

    Inc time is cost including child calls. Ex time is cost of method body only.

    When inlining is disabled, the Array.Count() is twice as slow.

    It could be due to the fact mentioned the now deleted answer. It would appear the attributes applied (ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success) and SecuritySafeCritical) prevents the runtime from inlining the call, hence the big difference (38 times slower in my case in 32-bit mode).

    To profile this yourself:

    Get https://github.com/leppie/IronScheme/raw/master/IronScheme/tools/IronScheme.Profiler.x86.dll Run the app (x86 build only) as:

    regsvr32 IronScheme.Profiler.x86.dll
    set COR_PROFILER={9E2B38F2-7355-4C61-A54F-434B7AC266C0}
    set COR_ENABLE_PROFILING=1
    ConsoleApp1.exe
    

    When app exits, a report.tab file is created which can then be used in Excel.

    0 讨论(0)
  • 2021-02-01 18:40

    I'm posting this, not as an answer, but to provide a more testable environment.

    I have taken a copy of the actual implementation of Enumerable<T>.Count() and changed the original test program to use it, so people can single-step it in the debugger.

    If you run a release version of the code below, you will get similar timings to the OP.

    For both List<T> and int[] the first cast assigned to is2 will be non-null so is2.Count will be called.

    So it would appear the difference is coming from the internal implementation of .Count.

    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Diagnostics;
    
    namespace ConsoleApplication1
    {
        class Program
        {
            public const long Iterations = (long)1e8;
    
            static void Main()
            {
                var list = new List<int>() { 1 };
                var array = new int[1];
                array[0] = 1;
    
                var results = new Dictionary<string, TimeSpan>();
                results.Add("int[]", Benchmark(array, Iterations));
                results.Add("List<int>", Benchmark(list, Iterations));
    
                Console.WriteLine("Function".PadRight(30) + "Count()");
                foreach (var result in results)
                {
                    Console.WriteLine("{0}{1}", result.Key.PadRight(30), Math.Round(result.Value.TotalSeconds, 3));
                }
                Console.ReadLine();
            }
    
            public static TimeSpan Benchmark(IEnumerable<int> source, long iterations)
            {
                var countWatch = new Stopwatch();
                countWatch.Start();
                for (long i = 0; i < iterations; i++) Count(source);
                countWatch.Stop();
    
                return countWatch.Elapsed;
            }
    
            public static int Count<TSource>(IEnumerable<TSource> source)
            {
                ICollection<TSource> is2 = source as ICollection<TSource>;
    
                if (is2 != null)
                    return is2.Count;  // This is executed for int[] AND List<int>.
    
                ICollection is3 = source as ICollection;
    
                if (is3 != null)
                    return is3.Count;
    
                int num = 0;
    
                using (IEnumerator<TSource> enumerator = source.GetEnumerator())
                {
                    while (enumerator.MoveNext())
                        num++;
                }
    
                return num;
            }
        }
    }
    

    Given this information, we can simplify the test to just concentrate on the timing differences between List.Count and Array.Count:

    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    
    namespace ConsoleApplication1
    {
        class Program
        {
            static void Main()
            {
                int dummy = 0;
                int count = 1000000000;
    
                var array = new int[1] as ICollection<int>;
                var list = new List<int> {0};
    
                var sw = Stopwatch.StartNew();
    
                for (int i = 0; i < count; ++i)
                    dummy += array.Count;
    
                Console.WriteLine("Array elapsed = " + sw.Elapsed);
    
                dummy = 0;
                sw.Restart();
    
                for (int i = 0; i < count; ++i)
                    dummy += list.Count;
    
                Console.WriteLine("List elapsed = " + sw.Elapsed);
    
                Console.ReadKey(true);
            }
        }
    }
    

    The above code gives the following results for a release build run outside the debugger:

    Array elapsed = 00:00:02.9586515
    List elapsed = 00:00:00.6098578
    

    At this point, I thought to myself "surely we can optimise the Count() to recognise T[] and return .Length directly. So I changed the implementation of Count() as follows:

    public static int Count<TSource>(IEnumerable<TSource> source)
    {
        var array = source as TSource[];
    
        if (array != null)        // Optimised for arrays.
            return array.Length;  // This is executed for int[] 
    
        ICollection<TSource> is2 = source as ICollection<TSource>;
    
        if (is2 != null)
            return is2.Count;  // This is executed for List<int>.
    
        ICollection is3 = source as ICollection;
    
        if (is3 != null)
            return is3.Count;
    
        int num = 0;
    
        using (IEnumerator<TSource> enumerator = source.GetEnumerator())
        {
            while (enumerator.MoveNext())
                num++;
        }
    
        return num;
    }
    

    Remarkably, even after making this change, the arrays were still slower on my system, despite the non-arrays having to make the extra cast!

    Results (release build) for me were:

    Function                      Count()
    List<int>                     1.753
    int[]                         2.304
    

    I'm at a total loss to explain this last result...

    0 讨论(0)
提交回复
热议问题