Cleaning up factor levels (collapsing multiple levels/labels)

后端 未结 10 1919
礼貌的吻别
礼貌的吻别 2020-11-22 14:27

What is the most effective (ie efficient / appropriate) way to clean up a factor containing multiple levels that need to be collapsed? That is, how to combine two or more fa

10条回答
  •  心在旅途
    2020-11-22 14:37

    UPDATE 2: See Uwe's answer which shows the new "tidyverse" way of doing this, which is quickly becoming the standard.

    UPDATE 1: Duplicated labels (but not levels!) are now indeed allowed (per my comment above); see Tim's answer.

    ORIGINAL ANSWER, BUT STILL USEFUL AND OF INTEREST: There is a little known option to pass a named list to the levels function, for exactly this purpose. The names of the list should be the desired names of the levels and the elements should be the current names that should be renamed. Some (including the OP, see Ricardo's comment to Tim's answer) prefer this for ease of reading.

    x <- c("Y", "Y", "Yes", "N", "No", "H", NA)
    x <- factor(x)
    levels(x) <- list("Yes"=c("Y", "Yes"), "No"=c("N", "No"))
    x
    ## [1] Yes  Yes  Yes  No   No     
    ## Levels: Yes No
    

    As mentioned in the levels documentation; also see the examples there.

    value: For the 'factor' method, a vector of character strings with length at least the number of levels of 'x', or a named list specifying how to rename the levels.

    This can also be done in one line, as Marek does here: https://stackoverflow.com/a/10432263/210673; the levels<- sorcery is explained here https://stackoverflow.com/a/10491881/210673.

    > `levels<-`(factor(x), list(Yes=c("Y", "Yes"), No=c("N", "No")))
    [1] Yes  Yes  Yes  No   No   
    Levels: Yes No
    

提交回复
热议问题