问题
I have this function to repeat a sequence:
public static List<T> Repeat<T>(this IEnumerable<T> lst, int count)
{
if (count < 0)
throw new ArgumentOutOfRangeException("count");
var ret = Enumerable.Empty<T>();
for (var i = 0; i < count; i++)
ret = ret.Concat(lst);
return ret.ToList();
}
Now if I do:
var d = Enumerable.Range(1, 100);
var f = d.Select(t => new Person()).Repeat(10);
int i = f.Distinct().Count();
I expect i
to be 100, but its giving me 1000! My question strictly is why is this happening? Shouldn't Linq be smart enough to figure out that it's the first selected 100 persons I need to concatenate with variable ret
? I'm getting a feeling that here the Concat
is being given preference when it's used with a Select
when its executed at ret.ToList()
..
Edit:
If I do this I get the correct result as expected:
var f = d.Select(t => new Person()).ToList().Repeat(10);
int i = f.Distinct().Count(); //prints 100
Edit again:
I have not overridden Equals
. I'm just trying to get 100 unique persons (by reference of course). My question is can someone elucidate to me why is Linq not doing the select operation first and then concatenation (of course at the time of execution)?
回答1:
The problem is that unless you call ToList
, the d.Select(t => new Person())
is re-enumerated each time the Repeat
goes through the list, creating duplicate Person
s. The technique is known as the deferred execution.
In general, LINQ
does not assume that each time it enumerates a sequence it would get the same sequence, or even a sequence of the same length. If this effect is not desirable, you can always "materialize" the sequence inside your Repeat
method by calling ToList
right away, like this:
public static List<T> Repeat<T>(this IEnumerable<T> lstEnum, int count) {
if (count < 0)
throw new ArgumentOutOfRangeException("count");
var lst = lstEnum.ToList(); // Enumerate only once
var ret = Enumerable.Empty<T>();
for (var i = 0; i < count; i++)
ret = ret.Concat(lst);
return ret.ToList();
}
回答2:
I could break down my problem to something less trivial:
var d = Enumerable.Range(1, 100);
var f = d.Select(t => new Person());
Now essentially I am doing this:
f = f.Concat(f);
Mind you query hasn't been executed till now. At the time of execution f
is still d.Select(t => new Person())
unexecuted. So the last statement at the time of execution can broken down to:
f = f.Concat(f);
//which is
f = d.Select(t => new Person()).Concat(d.Select(t => new Person()));
which is obvious to create 100 + 100 = 200 new instances of persons. So
f.Distinct().ToList(); //yields 200, not 100
which is the correct behaviour.
Edit: I could rewrite the extension method as simple as,
public static IEnumerable<T> Repeat<T>(this IEnumerable<T> source, int times)
{
source = source.ToArray();
return Enumerable.Range(0, times).SelectMany(_ => source);
}
I used dasblinkenlight's suggestion to fix the issue.
回答3:
Each Person
object is a separate object. All 1000 are distinct.
What is the definition of equality for the Person
type? If you don't override it, that definition will be reference equality, meaning all 1000 objects are distinct.
来源:https://stackoverflow.com/questions/13500641/is-the-order-of-execution-of-linq-the-reason-for-this-catch