Linq's Enumerable.Count method checks for ICollection<> but not for IReadOnlyCollection<>

ぐ巨炮叔叔 提交于 2019-12-07 18:52:00

问题


Background:

Linq-To-Objects has the extension method Count() (the overload not taking a predicate). Of course sometimes when a method requires only an IEnumerable<out T> (to do Linq), we will really pass a "richer" object to it, such as an ICollection<T>. In that situation it would be wasteful to actually iterate through the entire collection (i.e. get the enumerator and "move next" a whole bunch of times) to determine the count, for there is a property ICollection<T>.Count for this purpose. And this "shortcut" has been used in the BCL since the beginning of Linq.

Now, since .NET 4.5 (of 2012), there is another very nice interface, namely IReadOnlyCollection<out T>. It is like the ICollection<T> except that it only includes those member that return a T. For that reason it can be covariant in T ("out T"), just like IEnumerable<out T>, and that is really nice when item types can be more or less derived. But the new interface has its own property, IReadOnlyCollection<out T>.Count. See elsewhere on SO why these Count properties are distinct (instead of just one property).

The question:

Linq's method Enumerable.Count(this source) does check for ICollection<T>.Count, but it does not check for IReadOnlyCollection<out T>.Count.

Given that it is really natural and common to use Linq on read-only collections, would it be a good idea to change the BCL to check for both interfaces? I guess it would require one additional type check.

And would that be a breaking change (given that they did not "remember" to do this from the 4.5 version where the new interface was introduced)?

Sample code

Run the code:

    var x = new MyColl();
    if (x.Count() == 1000000000)
    {
    }

    var y = new MyOtherColl();
    if (y.Count() == 1000000000)
    {
    }

where MyColl is a type implementing IReadOnlyCollection<> but not ICollection<>, and where MyOtherColl is a type implementing ICollection<>. Specifically I used the simple/minimal classes:

class MyColl : IReadOnlyCollection<Guid>
{
  public int Count
  {
    get
    {
      Console.WriteLine("MyColl.Count called");
      // Just for testing, implementation irrelevant:
      return 0;
    }
  }

  public IEnumerator<Guid> GetEnumerator()
  {
    Console.WriteLine("MyColl.GetEnumerator called");
    // Just for testing, implementation irrelevant:
    return ((IReadOnlyCollection<Guid>)(new Guid[] { })).GetEnumerator();
  }

  System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  {
    Console.WriteLine("MyColl.System.Collections.IEnumerable.GetEnumerator called");
    return GetEnumerator();
  }
}
class MyOtherColl : ICollection<Guid>
{
  public int Count
  {
    get
    {
      Console.WriteLine("MyOtherColl.Count called");
      // Just for testing, implementation irrelevant:
      return 0;
    }
  }

  public bool IsReadOnly
  {
    get
    {
      return true;
    }
  }

  public IEnumerator<Guid> GetEnumerator()
  {
    Console.WriteLine("MyOtherColl.GetEnumerator called");
    // Just for testing, implementation irrelevant:
    return ((IReadOnlyCollection<Guid>)(new Guid[] { })).GetEnumerator();
  }

  System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  {
    Console.WriteLine("MyOtherColl.System.Collections.IEnumerable.GetEnumerator called");
    return GetEnumerator();
  }

  public bool Contains(Guid item) { throw new NotImplementedException(); }
  public void CopyTo(Guid[] array, int arrayIndex) { throw new NotImplementedException(); }
  public bool Remove(Guid item) { throw new NotSupportedException(); }
  public void Add(Guid item) { throw new NotSupportedException(); }
  public void Clear() { throw new NotSupportedException(); }
}

and got the output:

MyColl.GetEnumerator called
MyOtherColl.Count called

from the code run, which shows that the "shortcut" was not used in the first case (IReadOnlyCollection<out T>). Same result is seen in 4.5 and 4.5.1.


UPDATE after comment elsewhere on Stack Overflow by user supercat.

Linq was introduced in .NET 3.5 (2008), of course, and the IReadOnlyCollection<> was introduced only in .NET 4.5 (2012). However, in between, another feature, covariance in generics was introduced, in .NET 4.0 (2010). As I said above, IEnumerable<out T> became a covariant interface. But ICollection<T> stayed invariant in T (since it contains members like void Add(T item);).

Already in 2010 (.NET 4) this had the consequence that if Linq's Count extension method was used on a source of compile-time type IEnumerable<Animal> where the actual run-time type was for example List<Cat>, say, which is surely an IEnumerable<Cat> but also, by covariance, an IEnumerable<Animal>, then the "shortcut" was not used. The Count extension method checks only if the run-time type is an ICollection<Animal>, which it is not (no covariance). It can't check for ICollection<Cat> (how would it know what a Cat is, its TSource parameter equals Animal?).

Let me give an example:

static void ProcessAnimals(IEnuemrable<Animal> animals)
{
    int count = animals.Count();  // Linq extension Enumerable.Count<Animal>(animals)
    // ...
}

then:

List<Animal> li1 = GetSome_HUGE_ListOfAnimals();
ProcessAnimals(li1);  // fine, will use shortcut to ICollection<Animal>.Count property

List<Cat> li2 = GetSome_HUGE_ListOfCats();
ProcessAnimals(li2);  // works, but inoptimal, will iterate through entire List<> to find count

My suggested check for IReadOnlyCollection<out T> would "repair" this issue too, since that is one covariant interface which is implemented by List<T>.

Conclusion:

  1. Also checking for IReadOnlyCollection<TSource> would be beneficial in cases where the run-time type of source implements IReadOnlyCollection<> but not ICollection<> because the underlying collection class insists on being a read-only collection type and therefore wishes to not implement ICollection<>.
  2. (new) Also checking for IReadOnlyCollection<TSource> is beneficial even when the type of source is both ICollection<> and IReadOnlyCollection<>, if generic covariance applies. Specifically, the IEnumerable<TSource> may really be an ICollection<SomeSpecializedSourceClass> where SomeSpecializedSourceClass is convertible by reference conversion to TSource. ICollection<> is not covariant. However, the check for IReadOnlyCollection<TSource> will work by covariance; any IReadOnlyCollection<SomeSpecializedSourceClass> is also an IReadOnlyCollection<TSource>, and the shortcut will be utilized.
  3. The cost is one additional run-time type check per call to Linq's Count method.

回答1:


In many cases a class that implements IReadOnlyCollection<T> will also implement ICollection<T>. So you will still profit from the Count property shortcut.

See ReadOnlyCollection for example.

public class ReadOnlyCollection<T> : IList<T>, 
    ICollection<T>, IList, ICollection, IReadOnlyList<T>, IReadOnlyCollection<T>, 
    IEnumerable<T>, IEnumerable

Since its bad practice to check for other interfaces to get access beyond the given readonly interface it should be ok this way.

Implementing an additional type check for IReadOnlyInterface<T> in Count() will be additional ballast for every call on an object which doesn't implement IReadOnlyInterface<T>.




回答2:


Based on the MSDN documentation, ICollection<T> is the only type that gets this special treatment:

If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count.

I'm guessing they didn't see it as worthwhile to mess with the LINQ codebase (and its spec) for the sake of this optimization. There are lots of CLR types that have their own Count property, but LINQ can't account for all of them.



来源:https://stackoverflow.com/questions/22940167/linqs-enumerable-count-method-checks-for-icollection-but-not-for-ireadonlycol

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!