R: factor levels, recode rest to 'other'

后端 未结 4 1687
夕颜
夕颜 2021-02-08 03:28

I use factors somewhat infrequently and generally find them comprehensible, but I often am fuzzy about the details for specific operations. Currently, I am coding/collapsing cat

4条回答
  •  -上瘾入骨i
    2021-02-08 04:11

    You can use forcats::fct_other():

    library(forcats)
    data$naics <- fct_other(data$naics, keep = top8, other_level = 'other')
    

    Or using fct_other() as a part of a dplyr::mutate():

    library(dplyr)
    data <- mutate(data, naics = fct_other(naics, keep = top8, other_level = 'other')) 
    
    data %>% head(10)
       employees  naics
    1        420  other
    2        264  other
    3        189  other
    4        157 621610
    5        376 621610
    6        236  other
    7        658 621320
    8        959 621320
    9        216  other
    10       156  other
    

    Note that if the argument other_level is not set, the other levels default to 'Other' (uppercase 'O').

    And conversely, if you had a only a few factors you wanted converted to 'other', you could use the argument drop instead:

    data %>%  
      mutate(keep_fct = fct_other(naics, keep = top8, other_level = 'other'),
             drop_fct = fct_other(naics, drop = top8, other_level = 'other')) %>% 
      head(10)
    
       employees  naics keep_fct drop_fct
    1        474 621491    other   621491
    2        805 621111   621111    other
    3        434 621910    other   621910
    4        845 621111   621111    other
    5        243 621340    other   621340
    6        466 621493    other   621493
    7        369 621111   621111    other
    8         57 621493    other   621493
    9        144 621491    other   621491
    10       786 621910    other   621910
    

    dpylr also has recode_factor() where you can set the .default argument to other, but with a larger number of levels to recode, like with this example, could be tedious:

    data %>% 
       mutate(naices = recode_factor(naics, `621111` = '621111', `621210` = '621210', `621399` = '621399', `621610` = '621610', `621330` = '621330', `621310` = '621310', `621511` = '621511', `621420` = '621420', `621320` = '621320', .default = 'other'))
    

提交回复
热议问题