I'm having trouble coming up with the most efficient algorithm to remove duplicates from List<List<int>>
, for example (I know this looks like a list of int[]
, but just doing it that way for visual purposes:
my_list[0]= {1, 2, 3};
my_list[1]= {1, 2, 3};
my_list[2]= {9, 10, 11};
my_list[3]= {1, 2, 3};
So the output would just be
new_list[0]= {1, 2, 3};
new_list[1]= {9, 10, 11};
Let me know if you have any ideas. I would really appreciate it.
Build custom of EqualityComparer<List<int>>
:
public class CusComparer : IEqualityComparer<List<int>>
{
public bool Equals(List<int> x, List<int> y)
{
return x.SequenceEqual(y);
}
public int GetHashCode(List<int> obj)
{
int hashCode = 0;
for (var index = 0; index < obj.Count; index++)
{
hashCode ^= new {Index = index, Item = obj[index]}.GetHashCode();
}
return hashCode;
}
}
Then you can get the result by using Distinct with custom comparer method:
var result = my_list.Distinct(new CusComparer());
Edit:
Include the index into method GetHashCode
to make sure different orders will not be equal
This simple program does what you want:
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApplication6
{
class Program
{
static void Main(string[] args)
{
List<List<int>> lists = new List<List<int>>();
lists.Add(new List<int> { 1, 2, 3 });
lists.Add(new List<int> { 1, 2, 3 });
lists.Add(new List<int> { 9, 10, 11 });
lists.Add(new List<int> { 1, 2, 3 });
var distinct = lists.Select(x => new HashSet<int>(x))
.Distinct(HashSet<int>.CreateSetComparer());
foreach (var list in distinct)
{
foreach (var v in list)
{
Console.Write(v + " ");
}
Console.WriteLine();
}
}
}
}
var finalList = lists.GroupBy(x => String.Join(",", x))
.Select(x => x.First().ToList())
.ToList();
You can use the LINQ Distinct
overload that takes a comparer. The comparer should see if the lists are equal. Note that the default equals operations of lists won't do what you're really looking for, so the comparer will need to loop through each for you. Here's an example of such a comparer:
public class SequenceComparer<T> : IEqualityComparer<IEnumerable<T>>
{
IEqualityComparer<T> itemComparer;
public SequenceComparer()
{
this.itemComparer = EqualityComparer<T>.Default;
}
public SequenceComparer(IEqualityComparer<T> itemComparer)
{
this.itemComparer = itemComparer;
}
public bool Equals(IEnumerable<T> x, IEnumerable<T> y)
{
if (object.Equals(x, y))
return true;
if (x == null || y == null)
return false;
return x.SequenceEqual(y, itemComparer);
}
public int GetHashCode(IEnumerable<T> obj)
{
if (obj == null)
return -1;
int i = 0;
return obj.Aggregate(0, (x, y) => x ^ new { Index = i++, ItemHash = itemComparer.GetHashCode(y) }.GetHashCode());
}
}
Update: I got the idea of using an anonymous type to make a better hash from Cuong Le's answer, and I LINQ-ified it and made it work in my class.
For small sets of data, a comparer could be useful, but if you have 1000 or more List> then trying to compare them all could begin to take a long amount of time.
I suggest that you instead use your data to build a distinct tree. The building of the tree will be much faster and when you are done you can always bring your data back into your old data structure.
I wanted to compare the performance of the answers of @Leniel Macaferi and @L.B as I wasn't sure which would be more performant, or if the difference would be significant. It turns out that the difference is very significant:
Method 1: 00:00:00.0976649 @Leniel Macaferi
Method 2: 00:00:32.0961650 @L.B
Here is the code I used to benchmark them:
public static void Main(string[] args)
{
var list = new List<List<int>> {new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11, 1}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6, 7}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,}, new List<int> {1, 2, 31, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 6}, new List<int> {1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 9, 10, 11}};
var sw1 = new Stopwatch();
sw1.Start();
for (var i = 0; i < 1_000_000; i++)
{
var distinct = list.Select(x => new HashSet<int>(x)).Distinct(HashSet<int>.CreateSetComparer());
}
sw1.Stop();
Console.WriteLine($"Method 1: {sw1.Elapsed}");
var sw2 = new Stopwatch();
sw2.Start();
for (var i = 0; i < 1_000_000; i++)
{
var distinct = list.GroupBy(a => string.Join(",", a)).Select(a => a.First()).ToList();
}
sw2.Stop();
Console.WriteLine($"Method 2: {sw2.Elapsed}");
Console.ReadKey();
}
来源:https://stackoverflow.com/questions/12784788/c-sharp-remove-duplicates-from-listlistint