Find complement of a data frame (anti - join)

前端未结

关注

 7  1046

I have two data frames(df and df1). df1 is subset of df. I want to get a data frame which is complement of df1 in df, i.e. return rows of the first data set which are not ma

相关标签:

7条回答

滥情空心

2020-11-21 11:51
Late answer, but for another option we can try doing a formal SQL anti join, using the sqldf package:
```
library(sqldf)
sql <- "SELECT t1.heads
        FROM df t1 LEFT JOIN df1 t2
            ON t1.heads = t2.heads
        WHERE t2.heads IS NULL"
df2 <- sqldf(sql)
```
The sqldf package can be useful for those problems which are easily phrased using SQL logic, but perhaps less easily phrased using base R or another R package.
0 讨论(0)
发布评论:

提交评论
- 加载中...
走了就别回头了

2020-11-21 11:53
Try anti_join from dplyr
```
library(dplyr)
anti_join(df, df1, by='heads')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2020-11-21 12:01
Another option, using base R and the setdiff function:
```
df2 <- data.frame(heads = setdiff(df$heads, df1$heads))
```
setdiff functions exactly as you would imagine; take both arguments as sets, and remove all items in the second from the first.

I find setdiff more readable tahtn %in% and prefer not to require additional libraries when I don't need them, but which answer you use is largely a question of personal taste.
0 讨论(0)
发布评论:

提交评论
- 加载中...
遇见更好的自我

2020-11-21 12:04
Try the %in% command and reverse it with !
```
df[!df$heads %in% df1$heads,]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
被撕碎了的回忆

2020-11-21 12:06

dplyr also has setdiff() which will get you the

setdiff(bigFrame, smallFrame) gets you the extra records in the first table.

so for the OP's example the code would read setdiff(df, df1)

dplyr has a lot of great functionality: for a quick easy guide see here.

0 讨论(0)
发布评论:

提交评论
- 加载中...

我寻月下人不归

2020-11-21 12:17

Another option by creating a function negate_match_df by manipulating the code of match_df of plyr package.

library(plyr)
negate_match_df <- function (x, y, on = NULL) 
{
if (is.null(on)) {
    on <- intersect(names(x), names(y))
    message("Matching on: ", paste(on, collapse = ", "))
}
keys <- join.keys(x, y, on)
x[!keys$x %in% keys$y, , drop = FALSE]
}

Data

df <- read.table(text ="heads
row1
row2
row3
row4
row5",header=TRUE)

df1 <- read.table(text ="heads
row3
row5",header=TRUE)

Output

negate_match_df(df,df1)

0 讨论(0)

1 2 下一页