I\'m pretty sure this is a duplicate, but I have tried everything, and I still cannot seem to get the differences. I have two lists of strings: listA and listB. I\'m trying to
All code you posted should work fine so error is in another place anyway you write "these take a really long time" then I suppose you have a performance issue.
Let's do a very quick and dirty comparison (you know to do a good performance test is a long process, self-promotion: benchmark has been done with this free tool). Assumptions:
SymmetricExceptWith
and if not then its result is pretty different compared to Except
). If it was a mistake just ignore tests for SymmetricExceptWith
.Two lists of 20,000 random items (test repeated 100 times then averaged, release mode).
Method Time [ms] Contains *1 49.4 Contains *2 49.0 Except 5.9 SymmetricExceptWith *3 4.1 SymmetricExceptWith *4 2.5
Notes:
1 Loop with foreach
2 Loop with for
3 Hashset creation measured
4 Hashset creation not measured. I included this for reference but if you don't have first list as Hashset you can't ignore creation time.
We see Contains()
method is pretty slow so we can drop it in bigger tests (anyway I checked and its performance won't become better or even comparable). Let's see what will happen for 1,000,000 items list.
Method Time [ms] Except 244.4 SymmetricExceptWith 259.0
Let's try to make it parallel (please note that for this test I'm using a old Core 2 Duo 2 GHz):
Method Time [ms] Except 244.4 SymmetricExceptWith 259.0 Except (parallel partitions) 301.8 SymmetricExceptWith (p. p.) 382.6 Except (AsParallel) 274.4
Parallel performance are worse and LINQ Except is best option now. Let's see how it works on a better CPU (Xeon 2.8 GHz, quad core). Also note that with such big amount of data cache size won't affect testing too much.
Method Time [ms] Except 127.4 SymmetricExceptWith 149.2 Except (parallel partitions) 208.0 SymmetricExceptWith (p. p.) 170.0 Except (AsParallel) 80.2
To summarize: for relatively small lists SymmetricExceptWith()
will perform better, for big lists Except()
is always better. If you're targeting a modern multi-core CPU then parallel implementation will scale much better. In code:
var c = a.Except(b).ToList();
var c = a.AsParallel().Except(b.AsParallel()).ToList();
Please note that if you don't need List
as result and IEnumerable
is enough then performance will greatly increase (and difference with parallel execution will be higher).
Of course those two lines of code are not optimal and can be greatly increase (and if it's really performance critical you may pick ParallelEnumerable.Except()
implementation as starting point for your own specific highly optimized routine).