What is the most efficient way to update/replace NAs in main dataset with (correct) values in a lookup table? This is such a common operation! Similar questions do not seem
There's currently no one-shot for trying to coalesce more than one column (which can be done by using a lookup table approach within ifelse(is.na(value), ..., value)
), though there has been discussion of how such behavior may be implemented. For now, you can build it manually. If you've got a lot of columns, you can coalesce
programmatically, or even put it in a function.
library(tidyverse)
df1 <- tibble(
state_abbrev = state.abb[1:10],
state_name = c(state.name[1:5], rep(NA, 3), state.name[9:10]),
value = sample(500:1200, 10, replace=TRUE)
)
lookup_df <- tibble(
state_abbrev = state.abb[6:8],
state_name = state.name[6:8]
)
df1 %>%
full_join(lookup_df, by = 'state_abbrev') %>%
bind_cols(map_dfc(grep('.x', names(.), value = TRUE), function(x){
set_names(
list(coalesce(.[[x]], .[[gsub('.x', '.y', x)]])),
gsub('.x', '', x)
)
})) %>%
select(union(names(df1), names(lookup_df)))
#> # A tibble: 10 x 3
#> state_abbrev state_name value
#>
#> 1 AL Alabama 877
#> 2 AK Alaska 1048
#> 3 AZ Arizona 973
#> 4 AR Arkansas 860
#> 5 CA California 938
#> 6 CO Colorado 639
#> 7 CT Connecticut 547
#> 8 DE Delaware 672
#> 9 FL Florida 667
#> 10 GA Georgia 1142