Find complement of a data frame (anti - join)

前端 未结 7 1029
情深已故
情深已故 2020-11-21 11:20

I have two data frames(df and df1). df1 is subset of df. I want to get a data frame which is complement of df1 in df, i.e. return rows of the first data set which are not ma

相关标签:
7条回答
  • 2020-11-21 11:51

    Late answer, but for another option we can try doing a formal SQL anti join, using the sqldf package:

    library(sqldf)
    sql <- "SELECT t1.heads
            FROM df t1 LEFT JOIN df1 t2
                ON t1.heads = t2.heads
            WHERE t2.heads IS NULL"
    df2 <- sqldf(sql)
    

    The sqldf package can be useful for those problems which are easily phrased using SQL logic, but perhaps less easily phrased using base R or another R package.

    0 讨论(0)
  • 2020-11-21 11:53

    Try anti_join from dplyr

    library(dplyr)
    anti_join(df, df1, by='heads')
    
    0 讨论(0)
  • 2020-11-21 12:01

    Another option, using base R and the setdiff function:

    df2 <- data.frame(heads = setdiff(df$heads, df1$heads))
    

    setdiff functions exactly as you would imagine; take both arguments as sets, and remove all items in the second from the first.

    I find setdiff more readable tahtn %in% and prefer not to require additional libraries when I don't need them, but which answer you use is largely a question of personal taste.

    0 讨论(0)
  • 2020-11-21 12:04

    Try the %in% command and reverse it with !

    df[!df$heads %in% df1$heads,]
    
    0 讨论(0)
  • 2020-11-21 12:06

    dplyr also has setdiff() which will get you the

    setdiff(bigFrame, smallFrame) gets you the extra records in the first table.

    so for the OP's example the code would read setdiff(df, df1)

    dplyr has a lot of great functionality: for a quick easy guide see here.

    0 讨论(0)
  • 2020-11-21 12:17

    Another option by creating a function negate_match_df by manipulating the code of match_df of plyr package.

    library(plyr)
    negate_match_df <- function (x, y, on = NULL) 
    {
    if (is.null(on)) {
        on <- intersect(names(x), names(y))
        message("Matching on: ", paste(on, collapse = ", "))
    }
    keys <- join.keys(x, y, on)
    x[!keys$x %in% keys$y, , drop = FALSE]
    }
    

    Data

    df <- read.table(text ="heads
    row1
    row2
    row3
    row4
    row5",header=TRUE)
    
    df1 <- read.table(text ="heads
    row3
    row5",header=TRUE)
    

    Output

    negate_match_df(df,df1)
    
    0 讨论(0)
提交回复
热议问题