What is the best way to remove duplicates from a datatable?

前端 未结 12 1858
醉酒成梦
醉酒成梦 2021-01-02 13:23

I have checked the whole site and googled on the net but was unable to find a simple solution to this problem.

I have a datatable which has about 20 columns and 10K

相关标签:
12条回答
  • 2021-01-02 13:55

    It should be taken into account that Table.AcceptChanges() must be called to complete the deletion. Otherwise deleted row is still present in DataTable with RowState set to Deleted. And Table.Rows.Count is not changed after deletion.

    0 讨论(0)
  • 2021-01-02 13:58

    If you have access to Linq I think you should be able to use the built in group functionality on the in memory collection and pick out the duplicate rows

    Search Google for Linq Group by for examples

    0 讨论(0)
  • 2021-01-02 13:58

    Found this on bytes.com:

    You can use the JET 4.0 OLE DB provider with the classes in the System.Data.OleDb namespace to access the comma delimited text file (using a DataSet/DataTable).

    Or you could use Microsoft Text Driver for ODBC with the classes in the System.Data.Odbc namespace to access the file using ODBC drivers.

    That would allow you to access your data via sql queries, as others proposed.

    0 讨论(0)
  • 2021-01-02 13:59

    You can use Linq to Datasets. Check this. Something like this:

    // Fill the DataSet.
    DataSet ds = new DataSet();
    ds.Locale = CultureInfo.InvariantCulture;
    FillDataSet(ds);
    
    List<DataRow> rows = new List<DataRow>();
    
    DataTable contact = ds.Tables["Contact"];
    
    // Get 100 rows from the Contact table.
    IEnumerable<DataRow> query = (from c in contact.AsEnumerable()
                                  select c).Take(100);
    
    DataTable contactsTableWith100Rows = query.CopyToDataTable();
    
    // Add 100 rows to the list.
    foreach (DataRow row in contactsTableWith100Rows.Rows)
        rows.Add(row);
    
    // Create duplicate rows by adding the same 100 rows to the list.
    foreach (DataRow row in contactsTableWith100Rows.Rows)
        rows.Add(row);
    
    DataTable table =
        System.Data.DataTableExtensions.CopyToDataTable<DataRow>(rows);
    
    // Find the unique contacts in the table.
    IEnumerable<DataRow> uniqueContacts =
        table.AsEnumerable().Distinct(DataRowComparer.Default);
    
    Console.WriteLine("Unique contacts:");
    foreach (DataRow uniqueContact in uniqueContacts)
    {
        Console.WriteLine(uniqueContact.Field<Int32>("ContactID"));
    }
    
    0 讨论(0)
  • 2021-01-02 14:06

    How can I remove duplicate rows?. (Adjust the query there to join on your 4 key columns)

    EDIT: with your new information I believe the easiest way would be to implement IEqualityComparer<T> and use Distinct on your data rows. Otherwise if you're working with IEnumerable/IList instead of DataTable/DataRow, it is certainly possible with some LINQ-to-objects kung-fu.

    EDIT: example IEqualityComparer

    public class MyRowComparer : IEqualityComparer<DataRow>
    {
    
        public bool Equals(DataRow x, DataRow y)
        {
            return (x.Field<int>("ID") == y.Field<int>("ID")) &&
                string.Compare(x.Field<string>("Name"), y.Field<string>("Name"), true) == 0 &&
              ... // extend this to include all your 4 keys...
        }
    
        public int GetHashCode(DataRow obj)
        {
            return obj.Field<int>("ID").GetHashCode() ^ obj.Field<string>("Name").GetHashCode() etc.
        }
    }
    

    You can use it like this:

    var uniqueRows = myTable.AsEnumerable().Distinct(MyRowComparer);
    
    0 讨论(0)
  • 2021-01-02 14:07

    I think this must be the best way to remove duplicates from Datatable by using Linq and moreLinq Code:

    Linq

    RemoveDuplicatesRecords(yourDataTable);
    
    
    private DataTable RemoveDuplicatesRecords(DataTable dt)
    {
        var UniqueRows = dt.AsEnumerable().Distinct(DataRowComparer.Default);
        DataTable dt2 = UniqueRows.CopyToDataTable();
        return dt2;
    }
    

    Blog Article : Remove Duplicate rows records from DataTable Asp.net c#


    MoreLinq

    // Distinctby  column name ID 
    var valueDistinctByIdColumn = yourTable.AsEnumerable().DistinctBy(row => new { Id = row["Id"] });
    DataTable dtDistinctByIdColumn = valueDistinctByIdColumn.CopyToDataTable();
    

    Note: moreLinq need to add library.

    In morelinq you can use function called DistinctBy in which you can specify the property on which you want to find Distinct objects.

    Blog article : Using moreLinq DistinctBy method to remove duplicate records

    0 讨论(0)
提交回复
热议问题