I would like to compare two collections (in C#), but I\'m not sure of the best way to implement this efficiently.
I\'ve read the other thread about Enumerable.Sequen
static bool SetsContainSameElements<T>(IEnumerable<T> set1, IEnumerable<T> set2) {
var setXOR = new HashSet<T>(set1);
setXOR.SymmetricExceptWith(set2);
return (setXOR.Count == 0);
}
Solution requires .NET 3.5 and the System.Collections.Generic
namespace. According to Microsoft, SymmetricExceptWith
is an O(n + m) operation, with n representing the number of elements in the first set and m representing the number of elements in the second. You could always add an equality comparer to this function if necessary.
Why not use .Except()
// Create the IEnumerable data sources.
string[] names1 = System.IO.File.ReadAllLines(@"../../../names1.txt");
string[] names2 = System.IO.File.ReadAllLines(@"../../../names2.txt");
// Create the query. Note that method syntax must be used here.
IEnumerable<string> differenceQuery = names1.Except(names2);
// Execute the query.
Console.WriteLine("The following lines are in names1.txt but not names2.txt");
foreach (string s in differenceQuery)
Console.WriteLine(s);
http://msdn.microsoft.com/en-us/library/bb397894.aspx
If you use Shouldly, you can use ShouldAllBe with Contains.
collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};
collection1.ShouldAllBe(item=>collection2.Contains(item)); // true
And finally, you can write an extension.
public static class ShouldlyIEnumerableExtensions
{
public static void ShouldEquivalentTo<T>(this IEnumerable<T> list, IEnumerable<T> equivalent)
{
list.ShouldAllBe(l => equivalent.Contains(l));
}
}
UPDATE
A optional parameter exists on ShouldBe method.
collection1.ShouldBe(collection2, ignoreOrder: true); // true
erickson is almost right: since you want to match on counts of duplicates, you want a Bag. In Java, this looks something like:
(new HashBag(collection1)).equals(new HashBag(collection2))
I'm sure C# has a built-in Set implementation. I would use that first; if performance is a problem, you could always use a different Set implementation, but use the same Set interface.
You could use a Hashset. Look at the SetEquals method.
EDIT: I realized as soon as I posed that this really only works for sets -- it will not properly deal with collections that have duplicate items. For example { 1, 1, 2 } and { 2, 2, 1 } will be considered equal from this algorithm's perspective. If your collections are sets (or their equality can be measured that way), however, I hope you find the below useful.
The solution I use is:
return c1.Count == c2.Count && c1.Intersect(c2).Count() == c1.Count;
Linq does the dictionary thing under the covers, so this is also O(N). (Note, it's O(1) if the collections aren't the same size).
I did a sanity check using the "SetEqual" method suggested by Daniel, the OrderBy/SequenceEquals method suggested by Igor, and my suggestion. The results are below, showing O(N*LogN) for Igor and O(N) for mine and Daniel's.
I think the simplicity of the Linq intersect code makes it the preferable solution.
__Test Latency(ms)__
N, SetEquals, OrderBy, Intersect
1024, 0, 0, 0
2048, 0, 0, 0
4096, 31.2468, 0, 0
8192, 62.4936, 0, 0
16384, 156.234, 15.6234, 0
32768, 312.468, 15.6234, 46.8702
65536, 640.5594, 46.8702, 31.2468
131072, 1312.3656, 93.7404, 203.1042
262144, 3765.2394, 187.4808, 187.4808
524288, 5718.1644, 374.9616, 406.2084
1048576, 11420.7054, 734.2998, 718.6764
2097152, 35090.1564, 1515.4698, 1484.223