I have a List<> of objects in C# and I need a way to return those objects that are considered duplicates within the list. I do not need the Distinct resultset, I need a
var duplicates = from car in cars
group car by car.Color into grouped
from car in grouped.Skip(1)
select car;
This groups the cars by color and then skips the first result from each group, returning the remainder from each group flattened into a single sequence.
If you have particular requirements about which one you want to keep, e.g. if the car has an Id
property and you want to keep the car with the lowest Id
, then you could add some ordering in there, e.g.
var duplicates = from car in cars
group car by car.Color into grouped
from car in grouped.OrderBy(c => c.Id).Skip(1)
select car;
Without actually coding it, how about an algorithm something like this:
List<T>
creating a Dictionary<T, int>
Dictionary<T, int>
deleting entries where the int
is >1Anything left in the Dictionary
has duplicates. The second part where you actually delete is optional, of course. You can just iterate through the Dictionary
and look for the >1's to take action.
EDIT: OK, I bumped up Ryan's since he actually gave you code. ;)
Here's a slightly different Linq solution that I think makes it more obvious what you're trying to do:
var s = from car in cars
group car by car.Color into g
where g.Count() == 1
select g.First();
It's just grouping cars by color, tossing out all the groups that have more than one element, and then putting the rest into the returned IEnumerable.
My answer takes inspiration (in this order) from the followers respondents: Joe Coehoorn, Greg Beech and Jon Skeet.
I decided to provide a full example, with the assumption being (for real word efficiency) that you have a static list of car colors. I believe the following code illustrates a complete solution to the problem in an elegant, although not necessarily hyper-efficient, manner.
#region SearchForNonDistinctMembersInAGenericListSample
public static string[] carColors = new[]{"Red", "Blue", "Green"};
public static string[] carStyles = new[]{"Compact", "Sedan", "SUV", "Mini-Van", "Jeep"};
public class Car
{
public Car(){}
public string Color { get; set; }
public string Style { get; set; }
}
public static List<Car> SearchForNonDistinctMembersInAList()
{
// pass in cars normally, but declare here for brevity
var cars = new List<Car>(5) { new Car(){Color=carColors[0], Style=carStyles[0]},
new Car(){Color=carColors[1],Style=carStyles[1]},
new Car(){Color=carColors[0],Style=carStyles[2]},
new Car(){Color=carColors[2],Style=carStyles[3]},
new Car(){Color=carColors[0],Style=carStyles[4]}};
List<Car> carDupes = new List<Car>();
for (int i = 0; i < carColors.Length; i++)
{
Func<Car,bool> dupeMatcher = c => c.Color == carColors[i];
int count = cars.Count<Car>(dupeMatcher);
if (count > 1) // we have duplicates
{
foreach (Car dupe in cars.Where<Car>(dupeMatcher).Skip<Car>(1))
{
carDupes.Add(dupe);
}
}
}
return carDupes;
}
#endregion
I'm going to come back through here later and compare this solution to all three of its inspirations, just to contrast the styles. It's rather interesting.
public static IQueryable Duplicates(this IEnumerable source) where TSource : IComparable {
if (source == null)
throw new ArgumentNullException("source");
return source.Where(x => source.Count(y=>y.Equals(x)) > 1).AsQueryable<TSource>();
}
IEnumerable<Car> GetDuplicateColors(List<Car> cars)
{
return cars.Where(c => cars.Any(c2 => c2.Color == c.Color && cars.IndexOf(c2) < cars.IndexOf(c) ) );
}
It basically means "return cars where there's any car in the list with the same color and a smaller index".
Not sure of the performance, though. I suspect an approach with a O(1) lookup for duplicates (like the dictionary/hashset method) can be faster for large sets.