Merge (join) data frames - too many rows in result

前端 未结 3 1724
萌比男神i
萌比男神i 2021-01-16 19:50

I have two data frames(df1 and df2). I want to join them using merge function.

df1 has 3903 rows and df2 has 351 rows.

I want to left join df2 to df1 by a c

3条回答
  •  一向
    一向 (楼主)
    2021-01-16 20:34

    To make sure that your second data frame is unique on the join column(s), you can use my package safejoin (a wrapper around dplyr's join functions) which will give you an explicit error if it's not the case.

    Current situation :

    df1 <- data.frame(column1 = c("a","b","b"), X = 1:3)
    df2 <- data.frame(column1 = c("a","b"), Y = 4:5)
    df3 <- data.frame(column1 = c("a","a","b"), Y = 4:6)
    
    merge(df1,df2, by="column1",all.x=TRUE)
    #   column1 X Y
    # 1       a 1 4
    # 2       b 2 5
    # 3       b 3 5
    
    merge(df1,df3, by="column1",all.x=TRUE)
    #   column1 X Y
    # 1       a 1 4
    # 2       a 1 5
    # 3       b 2 6
    # 4       b 3 6
    

    Some values were duplicated by mistake.

    Using safejoin :

    # devtools::install_github("moodymudskipper/safejoin")
    library(safejoin)
    safe_left_join(df1, df2, check= "V")
    #   column1 X Y
    # 1       a 1 4
    # 2       b 2 5
    # 3       b 3 5
    
    safe_left_join(df1, df3, check= "V")
    # Error: y is not unique on column1
    # Call `rlang::last_error()` to see a backtrace
    

    check = "V" controls that the join columns are unique on the right hand side (check = "U" like Unique checks that they are unique on the left hand side, "V" is the next letter in the alphabet).

提交回复
热议问题