Get duplicates for two columns with LINQ

本秂侑毒 提交于 2019-12-10 15:31:22

问题


LINQ drives me crazy. Why does following query not return the duplicates, whereas it works with only one identifier? Where is my error?

' generate some test-data '
Dim source As New DataTable
source.Columns.Add(New DataColumn("RowNumber", GetType(Int32)))
source.Columns.Add(New DataColumn("Value1", GetType(Int32)))
source.Columns.Add(New DataColumn("Value2", GetType(Int32)))
source.Columns.Add(New DataColumn("Text", GetType(String)))
Dim rnd As New Random()
For i As Int32 = 1 To 100
    Dim newRow = source.NewRow
    Dim value = rnd.Next(1, 20)
    newRow("RowNumber") = i
    newRow("Value1") = value
    newRow("Value2") = (value + 1)
    newRow("Text") = String.Format("RowNumber{0}-Text", i)
    source.Rows.Add(newRow)
Next
' following query does not work, it always has Count=0 '
' although it works with only one identifier '
Dim dupIdentifiers = From row In source
         Group row By grp = New With {.Val1 = row("Value1"), .Val2 = row("Value2")}
         Into Group
         Where Group.Count > 1
         Select idGroup = New With {grp.Val1, grp.Val2, Group.Count}

Edit: Following is the complete solution, thanks to @Jon Skeet's answer :)

Dim dupKeys = From row In source
        Group row By grp = New With {Key .Val1 = CInt(row("Value1")), Key .Val2 = CInt(row("Value2"))}
        Into Group Where Group.Count > 1
        Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber"))

Dim dupRows = From row In source
        Join dupKey In dupKeys 
        On row("RowNumber") Equals dupKey 
        Select row

If dupRows.Any Then
    ' create a new DataTable from the first duplicate rows '
    Dim dest = dupRows.CopyToDataTable
End If

The main problem with grouping was that i must make them key properties. The next problem in my above code was to get the duplicate rows from the original table. Because nearly every row has a duplicate(according to two fields), the result DataTable contained 99 of 100 rows and not only the 19 duplicate values. I needed to select only the first duplicate row and join them with the original table on the PK.

Select RowNumber = CInt(Group.FirstOrDefault.Item("RowNumber"))

Although this works in my case, maybe someone can explain me how to select only the duplicates from the original table if i would have had only composite keys.


Edit: I'v answered the last part of the question myself, so here is all i need:

Dim dups = From row In source
         Group By grp = New With {Key .Value1 = CInt(row("Value1")), Key .Value2 = CInt(row("Value2"))}
         Into Group Where Group.Count > 1
         Let Text = Group.First.Item("Text")
         Select Group.First

If dups.Any Then
      Dim dest = dups.CopyToDataTable
End If

I needed the Let-Keyword in order to keep the other column(s) into the same context and return only the first row of the grouped dups. On this way i can use CopyToDataTable to create a DataTable from the duplicate rows.

Only a few lines of code overall (i can save the second query to find the rows in the original table) to find duplicates on multiple columns and create a DataTable of them.


回答1:


The problem is the way anonymous types work in VB - they're mutable by default; only Key properties are included for hashing and equality. Try this:

Group row By grp = New With {Key .Val1 = row("Value1"), Key .Val2 = row("Value2")}

(In C# this wouldn't be a problem - anonymous types in C# are always immutable in all properties.)



来源:https://stackoverflow.com/questions/7530906/get-duplicates-for-two-columns-with-linq

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!