I have two data frames(df1 and df2). I want to join them using merge function.
df1 has 3903 rows and df2 has 351 rows.
I want to left join df2 to df1 by a c
To make sure that your second data frame is unique on the join column(s), you can use my package safejoin (a wrapper around dplyr
's join functions) which will give you an explicit error if it's not the case.
Current situation :
df1 <- data.frame(column1 = c("a","b","b"), X = 1:3)
df2 <- data.frame(column1 = c("a","b"), Y = 4:5)
df3 <- data.frame(column1 = c("a","a","b"), Y = 4:6)
merge(df1,df2, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
merge(df1,df3, by="column1",all.x=TRUE)
# column1 X Y
# 1 a 1 4
# 2 a 1 5
# 3 b 2 6
# 4 b 3 6
Some values were duplicated by mistake.
Using safejoin :
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
safe_left_join(df1, df2, check= "V")
# column1 X Y
# 1 a 1 4
# 2 b 2 5
# 3 b 3 5
safe_left_join(df1, df3, check= "V")
# Error: y is not unique on column1
# Call `rlang::last_error()` to see a backtrace
check = "V"
controls that the join columns are unique on the right hand side (check = "U"
like Unique checks that they are unique on the left hand side, "V"
is the next letter in the alphabet).