I just want know if a \"FindAll\" will be faster than a \"Where\" extentionMethod and why?
Example :
myList.FindAll(item=> item.category == 5);
Well, FindAll
copies the matching elements to a new list, whereas Where
just returns a lazily evaluated sequence - no copying is required.
I'd therefore expect Where
to be slightly faster than FindAll
even when the resulting sequence is fully evaluated - and of course the lazy evaluation strategy of Where
means that if you only look at (say) the first match, it won't need to check the remainder of the list. (As Matthew points out, there's work in maintaining the state machine for Where
. However, this will only have a fixed memory cost - whereas constructing a new list may require multiple array allocations etc.)
Basically, FindAll(predicate)
is closer to Where(predicate).ToList()
than to just Where(predicate)
.
Just to react a bit more to Matthew's answer, I don't think he's tested it quite thoroughly enough. His predicate happens to pick half the items. Here's a short but complete program which tests the same list but with three different predicates - one picks no items, one picks all the items, and one picks half of them. In each case I run the test fifty times to get longer timing.
I'm using Count()
to make sure that the Where
result is fully evaluated. The results show that collecting around half the results, the two are neck and neck. Collecting no results, FindAll
wins. Collecting all the results, Where
wins. I find this intriguing: all of the solutions become slower as more and more matches are found: FindAll
has more copying to do, and Where
has to return the matched values instead of just looping within the MoveNext()
implementation. However, FindAll
gets slower faster than Where
does, so loses its early lead. Very interesting.
Results:
FindAll: All: 11994
Where: All: 8176
FindAll: Half: 6887
Where: Half: 6844
FindAll: None: 3253
Where: None: 4891
(Compiled with /o+ /debug- and run from the command line, .NET 3.5.)
Code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
class Test
{
static List<int> ints = Enumerable.Range(0, 10000000).ToList();
static void Main(string[] args)
{
Benchmark("All", i => i >= 0); // Match all
Benchmark("Half", i => i % 2 == 0); // Match half
Benchmark("None", i => i < 0); // Match none
}
static void Benchmark(string name, Predicate<int> predicate)
{
// We could just use new Func<int, bool>(predicate) but that
// would create one delegate wrapping another.
Func<int, bool> func = (Func<int, bool>)
Delegate.CreateDelegate(typeof(Func<int, bool>), predicate.Target,
predicate.Method);
Benchmark("FindAll: " + name, () => ints.FindAll(predicate));
Benchmark("Where: " + name, () => ints.Where(func).Count());
}
static void Benchmark(string name, Action action)
{
GC.Collect();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 50; i++)
{
action();
}
sw.Stop();
Console.WriteLine("{0}: {1}", name, sw.ElapsedMilliseconds);
}
}
The advantage of where is the deferred execution. See the difference if you'd have the following functionality
BigSequence.FindAll( x => DoIt(x) ).First();
BigSequence.Where( x => DoIt(x) ).First();
FindAll has covered the complete sequene, while Where in most sequences will stop enumerating as soon as one element is found.
The same effects will be one using Any(), Take(), Skip(), etc. I'm not sure, but I guess you'll have huge advantages in all functions that have deferred execution
I can give some clue, but not sure which one faster. FindAll() is executed right away. Where() is defferred executed.
How about we test instead of guess? Shame to see the wrong answer get out.
var ints = Enumerable.Range(0, 10000000).ToList();
var sw1 = Stopwatch.StartNew();
var findall = ints.FindAll(i => i % 2 == 0);
sw1.Stop();
var sw2 = Stopwatch.StartNew();
var where = ints.Where(i => i % 2 == 0).ToList();
sw2.Stop();
Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
/*
Debug
sw1: 1149856
sw2: 1652284
Release
sw1: 532194
sw2: 1016524
*/
Edit:
Even if I turn the above code from
var findall = ints.FindAll(i => i % 2 == 0);
...
var where = ints.Where(i => i % 2 == 0).ToList();
... to ...
var findall = ints.FindAll(i => i % 2 == 0).Count;
...
var where = ints.Where(i => i % 2 == 0).Count();
I get these results
/*
Debug
sw1: 1250409
sw2: 1267016
Release
sw1: 539536
sw2: 600361
*/
Edit 2.0...
If you want a list of the subset of the current list the fastest method if the FindAll(). The reason for this is simple. The FindAll instance method uses the indexer on the current List instead of the enumerator state machine. The Where() extension method is an external call to a different class that uses the enumerator. If you step from each node in the list to the next node you will have to call the MoveNext() method under the covers. As you can see from the above examples it is even faster to use the index entries to create a new list (that is pointing to the original items, so memory bloat will be minimal) to even just get a count of the filtered items.
Now if you are going to early abort from the Enumerator the Where() method could be faster. Of course if you move the early abort logic to the predicate of the FindAll() method you will again be using the indexer instead of the enumerator.
Now there are other reasons to use the Where() statement (such as the other linq methods, foreach blocks and many more) but the question was is the FindAll() faster than Where(). And unless you don't execute the Where() the answer seems to be yes. (When comparing apples to apples)
I am not say don't use LINQ or the .Where() method. They make for code that is much simpler to read. The question was about performance and not about how easy you can read and understand the code. By fast the fastest way to do this work would be to use a for block stepping each index and doing any logic as you want (even early exits). The reason LINQ is so great is becasue of the complex expression trees and transformation you can get with them. But using the iterator from the .Where() method has to go though tons of code to find it's way to a in memory statemachine that is just getting the next index out of the List. It should also be noted that this .FindAll() method is only useful on objects that implmented it (such as Array and List.)
Yet more...
for (int x = 0; x < 20; x++)
{
var ints = Enumerable.Range(0, 10000000).ToList();
var sw1 = Stopwatch.StartNew();
var findall = ints.FindAll(i => i % 2 == 0).Count;
sw1.Stop();
var sw2 = Stopwatch.StartNew();
var where = ints.AsEnumerable().Where(i => i % 2 == 0).Count();
sw2.Stop();
var sw4 = Stopwatch.StartNew();
var cntForeach = 0;
foreach (var item in ints)
if (item % 2 == 0)
cntForeach++;
sw4.Stop();
Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
Console.WriteLine("sw4: {0}", sw4.ElapsedTicks);
}
/* averaged results
sw1 575446.8
sw2 605954.05
sw3 394506.4
/*
Well, at least you can try to measure it.
The static Where
method is implemented using an iterator bloc (yield
keyword), which basically means that the execution will be deferred. If you only compare the calls to theses two methods, the first one will be slower, since it immediately implies that the whole collection will be iterated.
But if you include the complete iteration of the results you get, things can be a bit different. I'm pretty sure the yield
solution is slower, due to the generated state machine mechanism it implies. (see @Matthew anwser)