Background:
Linq-To-Objects has the extension method Count()
(the overload not taking a predicate). Of course sometimes when a method requires only an IEnumerable<out T>
(to do Linq), we will really pass a "richer" object to it, such as an ICollection<T>
. In that situation it would be wasteful to actually iterate through the entire collection (i.e. get the enumerator and "move next" a whole bunch of times) to determine the count, for there is a property ICollection<T>.Count
for this purpose. And this "shortcut" has been used in the BCL since the beginning of Linq.
Now, since .NET 4.5 (of 2012), there is another very nice interface, namely IReadOnlyCollection<out T>
. It is like the ICollection<T>
except that it only includes those member that return a T
. For that reason it can be covariant in T
("out T
"), just like IEnumerable<out T>
, and that is really nice when item types can be more or less derived. But the new interface has its own property, IReadOnlyCollection<out T>.Count
. See elsewhere on SO why these Count
properties are distinct (instead of just one property).
The question:
Linq's method Enumerable.Count(this source)
does check for ICollection<T>.Count
, but it does not check for IReadOnlyCollection<out T>.Count
.
Given that it is really natural and common to use Linq on read-only collections, would it be a good idea to change the BCL to check for both interfaces? I guess it would require one additional type check.
And would that be a breaking change (given that they did not "remember" to do this from the 4.5 version where the new interface was introduced)?
Sample code
Run the code:
var x = new MyColl();
if (x.Count() == 1000000000)
{
}
var y = new MyOtherColl();
if (y.Count() == 1000000000)
{
}
where MyColl
is a type implementing IReadOnlyCollection<>
but not ICollection<>
, and where MyOtherColl
is a type implementing ICollection<>
. Specifically I used the simple/minimal classes:
class MyColl : IReadOnlyCollection<Guid>
{
public int Count
{
get
{
Console.WriteLine("MyColl.Count called");
// Just for testing, implementation irrelevant:
return 0;
}
}
public IEnumerator<Guid> GetEnumerator()
{
Console.WriteLine("MyColl.GetEnumerator called");
// Just for testing, implementation irrelevant:
return ((IReadOnlyCollection<Guid>)(new Guid[] { })).GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
Console.WriteLine("MyColl.System.Collections.IEnumerable.GetEnumerator called");
return GetEnumerator();
}
}
class MyOtherColl : ICollection<Guid>
{
public int Count
{
get
{
Console.WriteLine("MyOtherColl.Count called");
// Just for testing, implementation irrelevant:
return 0;
}
}
public bool IsReadOnly
{
get
{
return true;
}
}
public IEnumerator<Guid> GetEnumerator()
{
Console.WriteLine("MyOtherColl.GetEnumerator called");
// Just for testing, implementation irrelevant:
return ((IReadOnlyCollection<Guid>)(new Guid[] { })).GetEnumerator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
Console.WriteLine("MyOtherColl.System.Collections.IEnumerable.GetEnumerator called");
return GetEnumerator();
}
public bool Contains(Guid item) { throw new NotImplementedException(); }
public void CopyTo(Guid[] array, int arrayIndex) { throw new NotImplementedException(); }
public bool Remove(Guid item) { throw new NotSupportedException(); }
public void Add(Guid item) { throw new NotSupportedException(); }
public void Clear() { throw new NotSupportedException(); }
}
and got the output:
MyColl.GetEnumerator called MyOtherColl.Count called
from the code run, which shows that the "shortcut" was not used in the first case (IReadOnlyCollection<out T>
). Same result is seen in 4.5 and 4.5.1.
UPDATE after comment elsewhere on Stack Overflow by user supercat
.
Linq was introduced in .NET 3.5 (2008), of course, and the IReadOnlyCollection<>
was introduced only in .NET 4.5 (2012). However, in between, another feature, covariance in generics was introduced, in .NET 4.0 (2010). As I said above, IEnumerable<out T>
became a covariant interface. But ICollection<T>
stayed invariant in T
(since it contains members like void Add(T item);
).
Already in 2010 (.NET 4) this had the consequence that if Linq's Count
extension method was used on a source of compile-time type IEnumerable<Animal>
where the actual run-time type was for example List<Cat>
, say, which is surely an IEnumerable<Cat>
but also, by covariance, an IEnumerable<Animal>
, then the "shortcut" was not used. The Count
extension method checks only if the run-time type is an ICollection<Animal>
, which it is not (no covariance). It can't check for ICollection<Cat>
(how would it know what a Cat
is, its TSource
parameter equals Animal
?).
Let me give an example:
static void ProcessAnimals(IEnuemrable<Animal> animals)
{
int count = animals.Count(); // Linq extension Enumerable.Count<Animal>(animals)
// ...
}
then:
List<Animal> li1 = GetSome_HUGE_ListOfAnimals();
ProcessAnimals(li1); // fine, will use shortcut to ICollection<Animal>.Count property
List<Cat> li2 = GetSome_HUGE_ListOfCats();
ProcessAnimals(li2); // works, but inoptimal, will iterate through entire List<> to find count
My suggested check for IReadOnlyCollection<out T>
would "repair" this issue too, since that is one covariant interface which is implemented by List<T>
.
Conclusion:
- Also checking for
IReadOnlyCollection<TSource>
would be beneficial in cases where the run-time type ofsource
implementsIReadOnlyCollection<>
but notICollection<>
because the underlying collection class insists on being a read-only collection type and therefore wishes to not implementICollection<>
. - (new) Also checking for
IReadOnlyCollection<TSource>
is beneficial even when the type ofsource
is bothICollection<>
andIReadOnlyCollection<>
, if generic covariance applies. Specifically, theIEnumerable<TSource>
may really be anICollection<SomeSpecializedSourceClass>
whereSomeSpecializedSourceClass
is convertible by reference conversion toTSource
.ICollection<>
is not covariant. However, the check forIReadOnlyCollection<TSource>
will work by covariance; anyIReadOnlyCollection<SomeSpecializedSourceClass>
is also anIReadOnlyCollection<TSource>
, and the shortcut will be utilized. - The cost is one additional run-time type check per call to Linq's
Count
method.
In many cases a class that implements IReadOnlyCollection<T>
will also implement ICollection<T>
. So you will still profit from the Count property shortcut.
See ReadOnlyCollection for example.
public class ReadOnlyCollection<T> : IList<T>,
ICollection<T>, IList, ICollection, IReadOnlyList<T>, IReadOnlyCollection<T>,
IEnumerable<T>, IEnumerable
Since its bad practice to check for other interfaces to get access beyond the given readonly interface it should be ok this way.
Implementing an additional type check for IReadOnlyInterface<T>
in Count()
will be additional ballast for every call on an object which doesn't implement IReadOnlyInterface<T>
.
Based on the MSDN documentation, ICollection<T>
is the only type that gets this special treatment:
If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count.
I'm guessing they didn't see it as worthwhile to mess with the LINQ codebase (and its spec) for the sake of this optimization. There are lots of CLR types that have their own Count
property, but LINQ can't account for all of them.
来源:https://stackoverflow.com/questions/22940167/linqs-enumerable-count-method-checks-for-icollection-but-not-for-ireadonlycol