Remove duplicated rows

前端未结

关注

 11  1776

I have read a CSV file into an R data.frame. Some of the rows have the same element in one of the columns. I would like to remove rows that are duplicates in th

相关标签:

11条回答

执念已碎

2020-11-22 00:35

With sqldf:

# Example by Mehdi Nellen
a <- c(rep("A", 3), rep("B", 3), rep("C",2))
b <- c(1,1,2,4,1,1,2,2)
df <-data.frame(a,b)

Solution:

 library(sqldf)
    sqldf('SELECT DISTINCT * FROM df')

Output:

0 讨论(0)

星月不相逢

2020-11-22 00:36
You can also use dplyr's distinct() function! It tends to be more efficient than alternative options, especially if you have loads of observations.
```
distinct_data <- dplyr::distinct(yourdata)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-11-22 00:37
The function distinct() in the dplyr package performs arbitrary duplicate removal, either from specific columns/variables (as in this question) or considering all columns/variables. dplyr is part of the tidyverse.

Data and package
```
library(dplyr)
dat <- data.frame(a = rep(c(1,2),4), b = rep(LETTERS[1:4],2))
```
Remove rows duplicated in a specific column (e.g., columna)

Note that .keep_all = TRUE retains all columns, otherwise only column a would be retained.
```
distinct(dat, a, .keep_all = TRUE)

  a b
1 1 A
2 2 B
```
Remove rows that are complete duplicates of other rows:
```
distinct(dat)

  a b
1 1 A
2 2 B
3 1 C
4 2 D
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

孤城傲影

2020-11-22 00:39

just isolate your data frame to the columns you need, then use the unique function :D

# in the above example, you only need the first three columns
deduped.data <- unique( yourdata[ , 1:3 ] )
# the fourth column no longer 'distinguishes' them, 
# so they're duplicates and thrown out.

0 讨论(0)

野趣味

2020-11-22 00:43
Here's a very simple, fast dplyr/tidy solution:

Remove rows that are entirely the same:
```
library(dplyr)
iris %>% 
  distinct(.keep_all = TRUE)
```
Remove rows that are the same only in certain columns:
```
iris %>% 
  distinct(Sepal.Length, Sepal.Width, .keep_all = TRUE)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

無奈伤痛

2020-11-22 00:51

the general answer can be for example:

df <-  data.frame(rbind(c(2,9,6),c(4,6,7),c(4,6,7),c(4,6,7),c(2,9,6))))



new_df <- df[-which(duplicated(df)), ]

output:

      X1 X2 X3
    1  2  9  6
    2  4  6  7

0 讨论(0)

1 2 下一页