Convert data.frame columns from factors to characters

前端 未结 18 1121
时光取名叫无心
时光取名叫无心 2020-11-22 04:43

I have a data frame. Let\'s call him bob:

> head(bob)
                 phenotype                         exclusion
GSM399350 3- 4- 8- 25- 44+         


        
相关标签:
18条回答
  • 2020-11-22 05:21

    This works for me - I finally figured a one liner

    df <- as.data.frame(lapply(df,function (y) if(class(y)=="factor" ) as.character(y) else y),stringsAsFactors=F)
    
    0 讨论(0)
  • 2020-11-22 05:23

    With the dplyr-package loaded use

    bob=bob%>%mutate_at("phenotype", as.character)
    

    if you only want to change the phenotype-column specifically.

    0 讨论(0)
  • 2020-11-22 05:25

    If you understand how factors are stored, you can avoid using apply-based functions to accomplish this. Which isn't at all to imply that the apply solutions don't work well.

    Factors are structured as numeric indices tied to a list of 'levels'. This can be seen if you convert a factor to numeric. So:

    > fact <- as.factor(c("a","b","a","d")
    > fact
    [1] a b a d
    Levels: a b d
    
    > as.numeric(fact)
    [1] 1 2 1 3
    

    The numbers returned in the last line correspond to the levels of the factor.

    > levels(fact)
    [1] "a" "b" "d"
    

    Notice that levels() returns an array of characters. You can use this fact to easily and compactly convert factors to strings or numerics like this:

    > fact_character <- levels(fact)[as.numeric(fact)]
    > fact_character
    [1] "a" "b" "a" "d"
    

    This also works for numeric values, provided you wrap your expression in as.numeric().

    > num_fact <- factor(c(1,2,3,6,5,4))
    > num_fact
    [1] 1 2 3 6 5 4
    Levels: 1 2 3 4 5 6
    > num_num <- as.numeric(levels(num_fact)[as.numeric(num_fact)])
    > num_num
    [1] 1 2 3 6 5 4
    
    0 讨论(0)
  • 2020-11-22 05:26

    I typically make this function apart of all my projects. Quick and easy.

    unfactorize <- function(df){
      for(i in which(sapply(df, class) == "factor")) df[[i]] = as.character(df[[i]])
      return(df)
    }
    
    0 讨论(0)
  • 2020-11-22 05:27

    At the beginning of your data frame include stringsAsFactors = FALSE to ignore all misunderstandings.

    0 讨论(0)
  • 2020-11-22 05:29

    Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:

    bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)
    

    This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.

    As @hadley points out, the following is more concise.

    bob[] <- lapply(bob, as.character)
    

    In both cases, lapply outputs a list; however, owing to the magical properties of R, the use of [] in the second case keeps the data.frame class of the bob object, thereby eliminating the need to convert back to a data.frame using as.data.frame with the argument stringsAsFactors = FALSE.

    0 讨论(0)
提交回复
热议问题