I have the following item set from an XML:
id category
5 1
5 3
5 4
5 3
5 3
<
// First Get DataTable as dt
// DataRowComparer Compare columns numbers in each row & data in each row
IEnumerable<DataRow> Distinct = dt.AsEnumerable().Distinct(DataRowComparer.Default);
foreach (DataRow row in Distinct)
{
Console.WriteLine("{0,-15} {1,-15}",
row.Field<int>(0),
row.Field<string>(1));
}
In addition to Jon Skeet's answer, you can also use the group by expressions to get the unique groups along w/ a count for each groups iterations:
var query = from e in doc.Elements("whatever")
group e by new { id = e.Key, val = e.Value } into g
select new { id = g.Key.id, val = g.Key.val, count = g.Count() };
Since we are talking about having every element exactly once, a "set" makes more sense to me.
Example with classes and IEqualityComparer implemented:
public class Product
{
public int Id { get; set; }
public string Name { get; set; }
public Product(int x, string y)
{
Id = x;
Name = y;
}
}
public class ProductCompare : IEqualityComparer<Product>
{
public bool Equals(Product x, Product y)
{ //Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
//Check whether the products' properties are equal.
return x.Id == y.Id && x.Name == y.Name;
}
public int GetHashCode(Product product)
{
//Check whether the object is null
if (Object.ReferenceEquals(product, null)) return 0;
//Get hash code for the Name field if it is not null.
int hashProductName = product.Name == null ? 0 : product.Name.GetHashCode();
//Get hash code for the Code field.
int hashProductCode = product.Id.GetHashCode();
//Calculate the hash code for the product.
return hashProductName ^ hashProductCode;
}
}
Now
List<Product> originalList = new List<Product> {new Product(1, "ad"), new Product(1, "ad")};
var setList = new HashSet<Product>(originalList, new ProductCompare()).ToList();
setList
will have unique elements
I thought of this while dealing with .Except()
which returns a set-difference
Are you trying to be distinct by more than one field? If so, just use an anonymous type and the Distinct operator and it should be okay:
var query = doc.Elements("whatever")
.Select(element => new {
id = (int) element.Attribute("id"),
category = (int) element.Attribute("cat") })
.Distinct();
If you're trying to get a distinct set of values of a "larger" type, but only looking at some subset of properties for the distinctness aspect, you probably want DistinctBy
as implemented in MoreLINQ in DistinctBy.cs:
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> comparer)
{
HashSet<TKey> knownKeys = new HashSet<TKey>(comparer);
foreach (TSource element in source)
{
if (knownKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
(If you pass in null
as the comparer, it will use the default comparer for the key type.)
I'm a bit late to the answer, but you may want to do this if you want the whole element, not only the values you want to group by:
var query = doc.Elements("whatever")
.GroupBy(element => new {
id = (int) element.Attribute("id"),
category = (int) element.Attribute("cat") })
.Select(e => e.First());
This will give you the first whole element matching your group by selection, much like Jon Skeets second example using DistinctBy, but without implementing IEqualityComparer comparer. DistinctBy will most likely be faster, but the solution above will involve less code if performance is not an issue.
Just use the Distinct()
with your own comparer.
http://msdn.microsoft.com/en-us/library/bb338049.aspx