dplyr filtering on multiple columns using “%in%”

后端 未结 2 1191
悲&欢浪女
悲&欢浪女 2021-01-29 03:43

I have a dataframe (df1) with multiple columns (ID, Number, Location, Field, Weight). I also have another dataframe (df2) with more information (ID, PassRate, Number, Weight). <

相关标签:
2条回答
  • 2021-01-29 04:12

    From the question and sample code, it is unclear whether you want df_sub to contain the rows in df1 which do have matches in df2, or the ones without matches. dplyr::semi_join() will return the rows with matches, dplyr::anti_join() will return the rows without matches.

    df_sub <- semi_join(x=df1, y=df2, by=c("ID","Weight")) 
    

    or

    df_sub <- anti_join(x=df1, y=df2, by=c("ID","Weight")) 
    
    0 讨论(0)
  • 2021-01-29 04:13

    Try this,

    df1[paste0(df1$ID, df1$Weight) %in% paste0(df2$ID, df2$Weight), ]
    

    what you are doing is filter the df1 by df2 value , not find the row match

    Try this sample data

    df1 
    ID  Weight
    1   a
    2   b
    
    
    df2 
    ID  Weight
    1   b
    2   a
    

    Using your function

     df_sub <- subset(df1, df1$ID %in% df2$ID & df1$Weight %in% df2$Weight)
    
    
    > df_sub
      ID Weight
    1  2      b
    2  1      a
    

    Actually , it give back the Boolean like below which cause all df1 value show up on df2 :

     True  True
     True  True
    

    using mine, the result is no one match :

     df1[paste0(df1$ID, df1$Weight) %in% paste0(df2$ID, df2$Weight), ]
    
    [1] ID     Weight
    <0 rows> (or 0-length row.names)
    
    0 讨论(0)
提交回复
热议问题