问题
I have a dataframe (df1) with multiple columns (ID, Number, Location, Field, Weight). I also have another dataframe (df2) with more information (ID, PassRate, Number, Weight).
I am trying to use dplyr and %in% to filter out rows in df1 that have the same two values as df2.
So far I have:
df_sub <- subset(df1, df1$ID %in% df2$ID & df1$Weight %in% df2$Weight)
But this is only subsetting on the first condition...any idea why?
回答1:
From the question and sample code, it is unclear whether you want df_sub
to contain the rows in df1
which do have matches in df2
, or the ones without matches. dplyr::semi_join()
will return the rows with matches, dplyr::anti_join()
will return the rows without matches.
df_sub <- semi_join(x=df1, y=df2, by=c("ID","Weight"))
or
df_sub <- anti_join(x=df1, y=df2, by=c("ID","Weight"))
回答2:
Try this,
df1[paste0(df1$ID, df1$Weight) %in% paste0(df2$ID, df2$Weight), ]
what you are doing is filter the df1
by df2
value , not find the row match
Try this sample data
df1
ID Weight
1 a
2 b
df2
ID Weight
1 b
2 a
Using your function
df_sub <- subset(df1, df1$ID %in% df2$ID & df1$Weight %in% df2$Weight)
> df_sub
ID Weight
1 2 b
2 1 a
Actually , it give back the Boolean like below which cause all df1
value show up on df2
:
True True
True True
using mine, the result is no one match :
df1[paste0(df1$ID, df1$Weight) %in% paste0(df2$ID, df2$Weight), ]
[1] ID Weight
<0 rows> (or 0-length row.names)
来源:https://stackoverflow.com/questions/45623451/dplyr-filtering-on-multiple-columns-using-in