Match/group duplicate rows (indices)

前端 未结 2 2008
情深已故
情深已故 2021-02-05 09:47

How can I efficiently match/group the indices of duplicated rows?

Let\'s say I have this data set:

set.seed(14)
dat <- data.frame(mtc         


        
2条回答
  •  情深已故
    2021-02-05 10:42

    Here's a possibility using "data.table":

    library(data.table)
    as.data.table(dat)[, c("GRP", "N") := .(.GRP, .N), by = names(dat)][
                       N > 1, list(list(.I)), by = GRP]
    ##    GRP             V1
    ## 1:   1      1,4,5,6,9
    ## 2:   2           2,13
    ## 3:   3  3, 7, 8,10,11
    

    The basic idea is to create a column that "groups" the other columns (using .GRP) as well as a column that counts how many duplicate rows there are (using .N), then filtering anything that has more than one duplicate, and putting the "GRP" column into a list.

提交回复
热议问题