recode related values in an efficient way

问题

I have a dataframe df with only one variable var with some related values.

df <- data.frame(var = c(rep('AUS',12), rep('NZ',12), rep('ENG',7), rep('SOC',12), 
                            rep('PAK',11), rep('SRI',17), rep('IND',15)))

df %>% count(var)
# # A tibble: 7 x 2
#      var     n
#   <fctr> <int>
# 1    AUS    12
# 2    ENG     7
# 3    IND    15
# 4     NZ    12
# 5    PAK    11
# 6    SOC    12
# 7    SRI    17

Based on some relations, some values should be recoded with a new value.

df %>% mutate(var = recode(var, 'AUS' = 'A', 'NZ' = 'A', 'ENG' = 'A', 
                           'SOC' = 'A', 'PAK' = 'B', 'SRI' = 'B')) %>% count(var)
# A tibble: 3 x 2
#      var     n
#   <fctr> <int>
# 1      A    43
# 2    IND    15
# 3      B    28

It can be seen that A and B recodes for 4 and 2 values respectively. I have also the expected solution in the question. However, is there any other efficient way to do this, instead of specifying the relations same number of times(4,2)??

回答1:

One way to do this is to use a vector with named entries as a lookup table.

Codes = c(rep('A', 4), rep('B', 2), 'IND') 
names(Codes) = c('AUS', 'NZ', 'ENG', 'SOC', 'PAK', 'SRI', 'IND')
df$var = Codes[as.character(df$var)]

来源：https://stackoverflow.com/questions/46527758/recode-related-values-in-an-efficient-way

标签

dplyr

recode

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!