Remove duplicated rows

前端未结

关注

 11  1791

清酒与你 2020-11-22 00:00

I have read a CSV file into an R data.frame. Some of the rows have the same element in one of the columns. I would like to remove rows that are duplicates in th

11条回答

忘掉有多难 (楼主)

2020-11-22 00:37
The function distinct() in the dplyr package performs arbitrary duplicate removal, either from specific columns/variables (as in this question) or considering all columns/variables. dplyr is part of the tidyverse.

Data and package
```
library(dplyr)
dat <- data.frame(a = rep(c(1,2),4), b = rep(LETTERS[1:4],2))
```
Remove rows duplicated in a specific column (e.g., columna)

Note that .keep_all = TRUE retains all columns, otherwise only column a would be retained.
```
distinct(dat, a, .keep_all = TRUE)

  a b
1 1 A
2 2 B
```
Remove rows that are complete duplicates of other rows:
```
distinct(dat)

  a b
1 1 A
2 2 B
3 1 C
4 2 D
```
0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...