How do I remove duplicates from a datatable altogether based on a column's value?

问题

I have 3 columns in a DataTable

Id Name Count

1 James 4345

2 Kristen 89231

3 James 599

4 Suneel 317113

I need rows 1 and 3 gone, and the new datatable returning only rows 2 and 4. I found a really good related question in the suggestions on SO--this guy. But his solution uses hashtables, and only eliminates row 3, not both 1 and 3. Help!

回答1:

I tried this Remove duplicates from a datatable..

using System.Data;
using System.Linq;
...
//assuming 'ds' is your DataSet
//and that ds has only one DataTable, therefore that table's index is '0'
DataTable dt = ds.Tables[0];
DataView dv = new DataView(dt);
string cols = string.Empty;
foreach (DataColumn col in dt.Columns)
{
if (!string.IsNullOrEmpty(cols)) cols += ",";
cols += col.ColumnName;
}
dt = dv.ToTable(true, cols.Split(','));
ds.Tables.RemoveAt(0);
ds.Tables.Add(dt);

Following single line of code will avoid the duplicate rows.

ds.Tables["Employee"].DefaultView.ToTable(true,"Name");

ds – Dataset object

dt.DefaultView.ToTable( true, "Name");

dt – DataTable object

回答2:

How about something like this;

Pseudo code: Assuming the object has 3 properties: [Id, Name, Value] and called NameObjects and is IEnumerable (List NameObjects;)

var _newNameObjectList = new List<NameObject>();

foreach(var nameObject in NameObjecs)
{
   if(_newNameObjectList.Select(x => x.Name == nameObject.Name).ToList().Count > 0)
   {
      _newNameObjectList.RemoveAll(x => x.Name == nameObject.Name);
      continue;
   }
   else
   {
      _newNameObjectList.Add(nameObject); 
   }
}

This should work. This uses the namespace System.Linq;

回答3:

Okay, so I looked at the blog pointed out to me by Pandiya. In the comments section, a chap called Kevin Morris has posted a solution using a C# Dictionary, which worked for me.

In my main block, I wrote:

string keyColumn = "Website";
RemoveDuplicates(table1, keyColumn);

And my RemoveDuplicates function was defined as:

    private void RemoveDuplicates(DataTable table1, string keyColumn)
{
    Dictionary<string, string> uniquenessDict = new Dictionary<string, string>(table1.Rows.Count);
    StringBuilder sb = null;
    int rowIndex = 0;
    DataRow row;
    DataRowCollection rows = table1.Rows;
    while (rowIndex < rows.Count - 1)
    {
        row = rows[rowIndex];
        sb = new StringBuilder();
            sb.Append(((string)row[keyColumn]));


        if (uniquenessDict.ContainsKey(sb.ToString()))
        {
            rows.Remove(row);
            if (RemoveAllDupes)
            {
                row = rows[rowIndex - 1];
                rows.Remove(row);
            }
        }
        else
        {
            uniquenessDict.Add(sb.ToString(), string.Empty);
            rowIndex++;
        }
    }
}

If you go to the blog, you will find a more generic function that allows sniffing dupes over multiple columns. I've added a flag--RemoveAllDupes--in case I want to remove all duplicate rows, but this still assumes that the rows are ordered by name, and involves only duplicates and not triplicates, quadruplicates and so on. If anyone can, please update this code to reflect removal of such.

来源：https://stackoverflow.com/questions/3200751/how-do-i-remove-duplicates-from-a-datatable-altogether-based-on-a-columns-value

标签

datatable

hashtable

duplicate-removal