I want to remove data from a dataframe that is present in another dataframe. Let me give an example:
letters<-c(\'a\',\'b\',\'c\',\'d\',\'e\')
numbers<-c(1
Base R Solution
list_one[!list_one$letters %in% list_two$letters2,]
gives you:
letters numbers
2 b 2
5 e 5
Explanation:
> list_one$letters %in% list_two$letters2
[1] TRUE FALSE TRUE TRUE FALSE
This gives you a vector of LENGTH == length(list_one$letters)
with TRUE/FALSE Values. !
negates this vector. So you end up with FALSE/TRUE values if the value is present in list_two$letters2.
If you have questions about how to select rows from a data.frame enter
?`[.data.frame`
to the console and read it.
A dplyr solution
library(dplyr)
list_one %>% anti_join(list_two)
Answer is response to your edit: " so I really can't use the negative expression".
I guess one of the most efficient ways to do this is using data.table
as follows:
require(data.table)
setDT(list_one)
setDT(list_two)
list_one[!list_two, on=c(letters = "letters2")]
Or
require(data.table)
setDT(list_one, key = "letters")
setDT(list_two, key = "letters2")
list_one[!letters2]
(Thanks to Frank for the improvement)
Result:
letters numbers
1: b 2
2: e 5
Have a look at ?"data.table"
and Quickly reading very large tables as dataframes in R on why to use data.table::fread
to read the csv-files in the first place.
BTW: If you have letters2
instead of list_two
you can use
list_one[!J(letters2)]