I\'d like to merge two data frames by id
, but they both have 2 of the same columns; therefore, when I merge i get new .x
and .y
column
Just drop everything you don't want from df2
- in this case the id
and value2
columns:
left_join(df1, select(df2, c(id,value2)), by = "id")
# id value1 element day value2
#1 1 1.2276303 TEST1 15 -0.1389861
#2 2 -0.8017795 TEST1 15 -0.5973131
#3 3 -1.0803926 TEST1 15 -2.1839668
#4 4 -0.1575344 TEST1 15 0.2408173
#5 5 -1.0717600 TEST1 15 -0.2593554
Beware that not all these answers are equivalent, and ask what it is you need as a result. E.g.:
df1 <- data.frame(id=1:3,day=2:4,element=3:5,value1=100:102)
df2 <- data.frame(id=1:3,day=3:5,element=4:6,value2=200:202)
df1
# id day element value1
#1 1 2 3 100
#2 2 3 4 101
#3 3 4 5 102
df2
# id day element value2
#1 1 3 4 200
#2 2 4 5 201
#3 3 5 6 202
left_join(df1, df2)
#Joining by: c("id", "day", "element")
# id day element value1 value2
#1 1 2 3 100 NA
#2 2 3 4 101 NA
#3 3 4 5 102 NA
left_join(df1, select(df2, c(id,value2)), by = "id")
# id day element value1 value2
#1 1 2 3 100 200
#2 2 3 4 101 201
#3 3 4 5 102 202
df <- left_join(df1, df2, by = c("id", "element", "day"))
You only need:
df <- left_join(df1, df2)
by = NULL, the default,
join
will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right
Output:
Joining by: c("id", "element", "day")
id value1 element day value2
1 1 -0.6264538 TEST1 15 -0.8204684
2 2 0.1836433 TEST1 15 0.4874291
3 3 -0.8356286 TEST1 15 0.7383247
4 4 1.5952808 TEST1 15 0.5757814
5 5 0.3295078 TEST1 15 -0.3053884
It's worth pointing out the comment by thelatemail: "Joining on id
is not the same as joining on id/element/day
". However, in this specific example, because element
and day
are the same for all records in both tables we get the same result.
Original result
Data
set.seed(1)
df1 <- data.frame(id = seq(1,5), value1 = rnorm(5), element = "TEST1", day = 15)
df2 <- data.frame(id = seq(1,5), value2 = rnorm(5), element = "TEST1", day = 15)
df <- left_join(df1, df2, by = "id")
Output:
id value1 element.x day.x value2 element.y day.y
1 1 -0.6264538 TEST1 15 -0.8204684 TEST1 15
2 2 0.1836433 TEST1 15 0.4874291 TEST1 15
3 3 -0.8356286 TEST1 15 0.7383247 TEST1 15
4 4 1.5952808 TEST1 15 0.5757814 TEST1 15
5 5 0.3295078 TEST1 15 -0.3053884 TEST1 15
After having checked that these columns are indeed the same, you could just remove them before doing the join
if (all(df1[,c('element', 'day')] == df2[,c('element', 'day')]))
df <- left_join(df1[,setdiff(colnames(df1),c('element', 'day'))], df2, by = "id")
else
stop("Should not happen!?")