I have read a CSV
file into an R data.frame. Some of the rows have the same element in one of the columns. I would like to remove rows that are duplicates in th
The function distinct() in the dplyr package performs arbitrary duplicate removal, either from specific columns/variables (as in this question) or considering all columns/variables. dplyr
is part of the tidyverse.
Data and package
library(dplyr)
dat <- data.frame(a = rep(c(1,2),4), b = rep(LETTERS[1:4],2))
Remove rows duplicated in a specific column (e.g., columna
)
Note that .keep_all = TRUE
retains all columns, otherwise only column a
would be retained.
distinct(dat, a, .keep_all = TRUE)
a b
1 1 A
2 2 B
Remove rows that are complete duplicates of other rows:
distinct(dat)
a b
1 1 A
2 2 B
3 1 C
4 2 D