Update/Replace Values in Dataframe with Tidyverse Join

后端 未结 5 2067
说谎
说谎 2021-01-01 01:29

What is the most efficient way to update/replace NAs in main dataset with (correct) values in a lookup table? This is such a common operation! Similar questions do not seem

5条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-01 01:50

    There's currently no one-shot for trying to coalesce more than one column (which can be done by using a lookup table approach within ifelse(is.na(value), ..., value)), though there has been discussion of how such behavior may be implemented. For now, you can build it manually. If you've got a lot of columns, you can coalesce programmatically, or even put it in a function.

    library(tidyverse)
    
    df1 <- tibble(
        state_abbrev = state.abb[1:10],
        state_name = c(state.name[1:5], rep(NA, 3), state.name[9:10]),
        value = sample(500:1200, 10, replace=TRUE)
    )
    
    lookup_df <- tibble(
        state_abbrev = state.abb[6:8],
        state_name = state.name[6:8]
    )
    
    df1 %>% 
        full_join(lookup_df, by = 'state_abbrev') %>% 
        bind_cols(map_dfc(grep('.x', names(.), value = TRUE), function(x){
            set_names(
                list(coalesce(.[[x]], .[[gsub('.x', '.y', x)]])), 
                gsub('.x', '', x)
            )
        })) %>% 
        select(union(names(df1), names(lookup_df)))
    #> # A tibble: 10 x 3
    #>    state_abbrev state_name  value
    #>                   
    #>  1 AL           Alabama       877
    #>  2 AK           Alaska       1048
    #>  3 AZ           Arizona       973
    #>  4 AR           Arkansas      860
    #>  5 CA           California    938
    #>  6 CO           Colorado      639
    #>  7 CT           Connecticut   547
    #>  8 DE           Delaware      672
    #>  9 FL           Florida       667
    #> 10 GA           Georgia      1142
    

提交回复
热议问题