I have a data frame. Let\'s call him bob
:
> head(bob)
phenotype exclusion
GSM399350 3- 4- 8- 25- 44+
This works for me - I finally figured a one liner
df <- as.data.frame(lapply(df,function (y) if(class(y)=="factor" ) as.character(y) else y),stringsAsFactors=F)
With the dplyr
-package loaded use
bob=bob%>%mutate_at("phenotype", as.character)
if you only want to change the phenotype
-column specifically.
If you understand how factors are stored, you can avoid using apply-based functions to accomplish this. Which isn't at all to imply that the apply solutions don't work well.
Factors are structured as numeric indices tied to a list of 'levels'. This can be seen if you convert a factor to numeric. So:
> fact <- as.factor(c("a","b","a","d")
> fact
[1] a b a d
Levels: a b d
> as.numeric(fact)
[1] 1 2 1 3
The numbers returned in the last line correspond to the levels of the factor.
> levels(fact)
[1] "a" "b" "d"
Notice that levels()
returns an array of characters. You can use this fact to easily and compactly convert factors to strings or numerics like this:
> fact_character <- levels(fact)[as.numeric(fact)]
> fact_character
[1] "a" "b" "a" "d"
This also works for numeric values, provided you wrap your expression in as.numeric()
.
> num_fact <- factor(c(1,2,3,6,5,4))
> num_fact
[1] 1 2 3 6 5 4
Levels: 1 2 3 4 5 6
> num_num <- as.numeric(levels(num_fact)[as.numeric(num_fact)])
> num_num
[1] 1 2 3 6 5 4
I typically make this function apart of all my projects. Quick and easy.
unfactorize <- function(df){
for(i in which(sapply(df, class) == "factor")) df[[i]] = as.character(df[[i]])
return(df)
}
At the beginning of your data frame include stringsAsFactors = FALSE
to ignore all misunderstandings.
Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:
bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)
This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.
As @hadley points out, the following is more concise.
bob[] <- lapply(bob, as.character)
In both cases, lapply
outputs a list; however, owing to the magical properties of R, the use of []
in the second case keeps the data.frame class of the bob
object, thereby eliminating the need to convert back to a data.frame using as.data.frame
with the argument stringsAsFactors = FALSE
.