Remove same columns from left_join

前端 未结 4 1939
野趣味
野趣味 2021-01-19 01:23

I\'d like to merge two data frames by id, but they both have 2 of the same columns; therefore, when I merge i get new .x and .y column

相关标签:
4条回答
  • 2021-01-19 01:57

    Just drop everything you don't want from df2 - in this case the id and value2 columns:

    left_join(df1, select(df2, c(id,value2)), by = "id")
    
    #  id     value1 element day     value2
    #1  1  1.2276303   TEST1  15 -0.1389861
    #2  2 -0.8017795   TEST1  15 -0.5973131
    #3  3 -1.0803926   TEST1  15 -2.1839668
    #4  4 -0.1575344   TEST1  15  0.2408173
    #5  5 -1.0717600   TEST1  15 -0.2593554
    

    Beware that not all these answers are equivalent, and ask what it is you need as a result. E.g.:

    df1 <- data.frame(id=1:3,day=2:4,element=3:5,value1=100:102)
    df2 <- data.frame(id=1:3,day=3:5,element=4:6,value2=200:202)
    df1
    
    #  id day element value1
    #1  1   2       3    100
    #2  2   3       4    101
    #3  3   4       5    102
    
    df2
    #  id day element value2
    #1  1   3       4    200
    #2  2   4       5    201
    #3  3   5       6    202
    
    left_join(df1, df2)
    #Joining by: c("id", "day", "element")
    #  id day element value1 value2
    #1  1   2       3    100     NA
    #2  2   3       4    101     NA
    #3  3   4       5    102     NA
    
    left_join(df1, select(df2, c(id,value2)), by = "id")
    #  id day element value1 value2
    #1  1   2       3    100    200
    #2  2   3       4    101    201
    #3  3   4       5    102    202
    
    0 讨论(0)
  • 2021-01-19 02:01
    df <- left_join(df1, df2, by = c("id", "element", "day"))
    
    0 讨论(0)
  • 2021-01-19 02:07

    You only need:

    df <- left_join(df1, df2)
    

    by = NULL, the default, join will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right

    Output:

    Joining by: c("id", "element", "day")
      id     value1 element day     value2
    1  1 -0.6264538   TEST1  15 -0.8204684
    2  2  0.1836433   TEST1  15  0.4874291
    3  3 -0.8356286   TEST1  15  0.7383247
    4  4  1.5952808   TEST1  15  0.5757814
    5  5  0.3295078   TEST1  15 -0.3053884
    

    It's worth pointing out the comment by thelatemail: "Joining on id is not the same as joining on id/element/day". However, in this specific example, because element and day are the same for all records in both tables we get the same result.

    Original result

    Data

    set.seed(1)
    df1 <- data.frame(id = seq(1,5), value1 = rnorm(5), element = "TEST1", day = 15) 
    df2 <- data.frame(id = seq(1,5), value2 = rnorm(5), element = "TEST1", day = 15) 
    df <- left_join(df1, df2, by = "id")
    

    Output:

      id     value1 element.x day.x     value2 element.y day.y
    1  1 -0.6264538     TEST1    15 -0.8204684     TEST1    15
    2  2  0.1836433     TEST1    15  0.4874291     TEST1    15
    3  3 -0.8356286     TEST1    15  0.7383247     TEST1    15
    4  4  1.5952808     TEST1    15  0.5757814     TEST1    15
    5  5  0.3295078     TEST1    15 -0.3053884     TEST1    15
    
    0 讨论(0)
  • 2021-01-19 02:08

    After having checked that these columns are indeed the same, you could just remove them before doing the join

    if (all(df1[,c('element', 'day')] == df2[,c('element', 'day')]))
      df <- left_join(df1[,setdiff(colnames(df1),c('element', 'day'))], df2, by = "id")
    else
      stop("Should not happen!?")
    
    0 讨论(0)
提交回复
热议问题