I have two data frames(df and df1). df1 is subset of df. I want to get a data frame which is complement of df1 in df, i.e. return rows of the first data set which are not ma
Late answer, but for another option we can try doing a formal SQL anti join, using the sqldf
package:
library(sqldf)
sql <- "SELECT t1.heads
FROM df t1 LEFT JOIN df1 t2
ON t1.heads = t2.heads
WHERE t2.heads IS NULL"
df2 <- sqldf(sql)
The sqldf
package can be useful for those problems which are easily phrased using SQL logic, but perhaps less easily phrased using base R or another R package.
Try anti_join
from dplyr
library(dplyr)
anti_join(df, df1, by='heads')
Another option, using base R and the setdiff
function:
df2 <- data.frame(heads = setdiff(df$heads, df1$heads))
setdiff
functions exactly as you would imagine; take both arguments as sets, and remove all items in the second from the first.
I find setdiff
more readable tahtn %in%
and prefer not to require additional libraries when I don't need them, but which answer you use is largely a question of personal taste.
Try the %in%
command and reverse it with !
df[!df$heads %in% df1$heads,]
dplyr also has setdiff()
which will get you the
setdiff(bigFrame, smallFrame)
gets you the extra records in the first table.
so for the OP's example the code would read setdiff(df, df1)
dplyr has a lot of great functionality: for a quick easy guide see here.
Another option by creating a function negate_match_df
by manipulating the code of match_df
of plyr
package.
library(plyr)
negate_match_df <- function (x, y, on = NULL)
{
if (is.null(on)) {
on <- intersect(names(x), names(y))
message("Matching on: ", paste(on, collapse = ", "))
}
keys <- join.keys(x, y, on)
x[!keys$x %in% keys$y, , drop = FALSE]
}
Data
df <- read.table(text ="heads
row1
row2
row3
row4
row5",header=TRUE)
df1 <- read.table(text ="heads
row3
row5",header=TRUE)
Output
negate_match_df(df,df1)