How can I efficiently match/group the indices of duplicated rows?
Let\'s say I have this data set:
set.seed(14)
dat <- data.frame(mtc
Here's a possibility using "data.table":
library(data.table)
as.data.table(dat)[, c("GRP", "N") := .(.GRP, .N), by = names(dat)][
N > 1, list(list(.I)), by = GRP]
## GRP V1
## 1: 1 1,4,5,6,9
## 2: 2 2,13
## 3: 3 3, 7, 8,10,11
The basic idea is to create a column that "groups" the other columns (using .GRP
) as well as a column that counts how many duplicate rows there are (using .N
), then filtering anything that has more than one duplicate, and putting the "GRP" column into a list
.