问题
My problem is the following: Lets say I have an existing dataframe with the following columns: UID, foo, result. Result is already partially filled. A second model now predicts additional rows, generating a second dataframe containing a UID and a result column: (Code to reproduce at bottom)
## df_main
## UID foo result
## <dbl> <chr> <chr>
## 1 1 moo Cow
## 2 2 rum <NA>
## 3 3 oink <NA>
## 4 4 woof Dog
## 5 5 hiss <NA>
## new_prediction
## UID result
## <dbl> <chr>
## 1 3 Pig
## 2 5 Snake
I now want to left_join the new results by UID to get the following result column:
## Cow
## <NA>
## Pig
## Dog
## Snake
But I can't get that to work, since left_join(df_main, new_prediction, by="UID")
creates result.x
and result.y
. Is there any way to do this with dplyr, or alternatively, a good second step to join the columns? I looked at various functions, but finally resolved to loop over all rows manually. I am pretty certain that there is a more "R" way to do that?
Code for dataframes:
df_main <- tibble(UID = c(1,2,3,4,5), foo=c("moo", "rum", "oink", "woof", "hiss"), result=c("Cow", NA, NA, "Dog", NA))
new_prediction <- tibble(UID = c(3,5), result = c("Pig", "Snake"))
回答1:
coalesce
is your second step.
left_join(df_main, new_prediction, by="UID") %>%
mutate(result = coalesce(result.x, result.y)) %>%
select(-result.x, -result.y)
# # A tibble: 5 x 3
# UID foo result
# <dbl> <chr> <chr>
# 1 1 moo Cow
# 2 2 rum <NA>
# 3 3 oink Pig
# 4 4 woof Dog
# 5 5 hiss Snake
coalesce
will accept as many columns as you give it. Earlier columns have precedence in case there are multiple non-missing values.
回答2:
Adding to Gregor's answer of using coalesce
, you could also "manually" join the columns with ifelse
.
left_join(df_main, new_prediction, by = "UID") %>%
mutate(result = ifelse(is.na(result.x),result.y, result.x)) %>%
select(-c(result.x, result.y))
# A tibble: 5 x 3
# UID foo result
# <dbl> <chr> <chr>
# 1 1.00 moo Cow
# 2 2.00 rum <NA>
# 3 3.00 oink Pig
# 4 4.00 woof Dog
# 5 5.00 hiss Snake
来源:https://stackoverflow.com/questions/48284524/left-join-r-dataframes-merging-two-columns-with-nas