I have MyObject with field: id, a, b, c, e, f and I have List with 500 000 items, now how can I remove all duplicate items with of the same value of the parameter a, c, f? <
Well you can always use LINQ Distinct()
like this :
var matches = list.Distinct(new Comparer()).ToList();
But for Ditsinct()
to work you need to impletemnt Comparer for your Class:
class Comparer : IEqualityComparer<MyObject>
{
public bool Equals(MyObject x, MyObject y)
{
return x.a == y.a && x.c == y.c && x.f == y.f;
}
public int GetHashCode(MyObject obj)
{
return (obj.a + obj.c + obj.f).GetHashCode();
}
}
If what you are looking for is speed, and don't mind using up some memory then I would recommend that you use a HashSet
, if you are interested in doing some custom comparison, then you can make an IEqualityComparer<T>
, something like this:
var original = new ArrayList(); // whatever your original collection is
var unique = new HasSet<YourClass>(new MyCustomEqualityComparer());
foreach(var item in original)
{
if(!unique.Contains(item))
unique.Add(item);
}
return unique;
the issue here is that you may end up gobbling up twice the original memory.
I made some extra research and I think you can achieve just what you want by simply doing:
var original // your original data
var unique = new HashSet<YourClass>(origin, new CustomEqualityComparer());
that should take care of removing duplicated data as no duplication is allowed in a HashSet
. I'd recommend that you also take a look at this question about GetHasCode
implementation guidelines.
If you want to know some more about the HashSet
class follow these links:
About HashSet
About IEqualityComparer constructor
IEqualityComparer documentation
hope this helps
One efficient method would be first to to a quicksort (or similar n Log n sort), based on a hash of (a, c, f) and then you can iterate through the resultant list, picking one every time the value of (a, c, f) changes.
That would give a n log n speed solution, which is probably the best you can do.
Drakko!
You can use the Distinct()
method to get only the values that has different values for the properties you specify.
You could do something like this:
List<MyObj> list = new List<MyObj>();
//Run the code that is going to populate your list.
var result = list.Select(myObj => new { myObj.a, myObj.c, myObj.f})
.Distinct().ToList();
//Result contains the data based on the difference.
Code from this link worked great for me. https://nishantrana.me/2014/08/14/remove-duplicate-objects-in-list-in-c/
public class MyClass
{
public string ID { get; set; }
public string Value { get; set; }
}
List<MyClass> myList = new List<MyClass>();
var xrmOptionSet = new MyClass();
xrmOptionSet.ID = "1";
xrmOptionSet.Value = "100";
var xrmOptionSet1 = new MyClass();
xrmOptionSet1.ID = "2";
xrmOptionSet1.Value = "200";
var xrmOptionSet2 = new MyClass();
xrmOptionSet2.ID = "1";
xrmOptionSet2.Value = "100";
myList.Add(xrmOptionSet);
myList.Add(xrmOptionSet1);
myList.Add(xrmOptionSet2);
// here we are first grouping the result by label and then picking the first item from each group
var myDistinctList = myList.GroupBy(i => i.ID)
.Select(g => g.First()).ToList();