Lets say I want to merge two different dataframes by the key of two columns.
Dataframe One has 70000 obs of 10 variables. Dataframe Two has 4500 obs of 5 variables.
I think you can use dplyr::anti_join
for this. From its documentation:
return all rows from x where there are not matching values in y, keeping just columns from x.
You'd probably have to pass your data frame TWO
as x
.
EDIT: as mentioned in the comments, the syntax for its by
argument is different.
Example:
df1 <- data.frame(Name=c("a", "b", "c"),
Date1=c(1,2,3),
stringsAsFactors=FALSE)
df2 <- data.frame(Name=c("a", "d"),
Date2=c(1,2),
stringsAsFactors=FALSE)
> dplyr::anti_join(df2, df1, by=c("Name"="Name", "Date2"="Date1"))
Name Date
1 d 2