Convert factor to integer in a data frame

前端 未结 2 1102
梦毁少年i
梦毁少年i 2020-12-01 19:57

I have the following code

anna.table<-data.frame (anna1,anna2)
write.table<-(anna.table, file=\"anna.file.txt\",sep=\'\\t\', quote=FALSE) 
相关标签:
2条回答
  • 2020-12-01 20:53

    With anna.table (it is a data frame by the way, a table is something else!), the easiest way will be to just do:

    anna.table2 <- data.matrix(anna.table)
    

    as data.matrix() will convert factors to their underlying numeric (integer) levels. This will work for a data frame that contains only numeric, integer, factor or other variables that can be coerced to numeric, but any character strings (character) will cause the matrix to become a character matrix.

    If you want anna.table2 to be a data frame, not as matrix, then you can subsequently do:

    anna.table2 <- data.frame(anna.table2)
    

    Other options are to coerce all factor variables to their integer levels. Here is an example of that:

    ## dummy data
    set.seed(1)
    dat <- data.frame(a = factor(sample(letters[1:3], 10, replace = TRUE)), 
                      b = runif(10))
    
    ## sapply over `dat`, converting factor to numeric
    dat2 <- sapply(dat, function(x) if(is.factor(x)) {
                                        as.numeric(x)
                                    } else {
                                        x
                                    })
    dat2 <- data.frame(dat2) ## convert to a data frame
    

    Which gives:

    > str(dat)
    'data.frame':   10 obs. of  2 variables:
     $ a: Factor w/ 3 levels "a","b","c": 1 2 2 3 1 3 3 2 2 1
     $ b: num  0.206 0.177 0.687 0.384 0.77 ...
    > str(dat2)
    'data.frame':   10 obs. of  2 variables:
     $ a: num  1 2 2 3 1 3 3 2 2 1
     $ b: num  0.206 0.177 0.687 0.384 0.77 ...
    

    However, do note that the above will work only if you want the underlying numeric representation. If your factor has essentially numeric levels, then we need to be a bit cleverer in how we convert the factor to a numeric whilst preserving the "numeric" information coded in the levels. Here is an example:

    ## dummy data
    set.seed(1)
    dat3 <- data.frame(a = factor(sample(1:3, 10, replace = TRUE), levels = 3:1), 
                       b = runif(10))
    
    ## sapply over `dat3`, converting factor to numeric
    dat4 <- sapply(dat3, function(x) if(is.factor(x)) {
                                        as.numeric(as.character(x))
                                    } else {
                                        x
                                    })
    dat4 <- data.frame(dat4) ## convert to a data frame
    

    Note how we need to do as.character(x) first before we do as.numeric(). The extra call encodes the level information before we convert that to numeric. To see why this matters, note what dat3$a is

    > dat3$a
     [1] 1 2 2 3 1 3 3 2 2 1
    Levels: 3 2 1
    

    If we just convert that to numeric, we get the wrong data as R converts the underlying level codes

    > as.numeric(dat3$a)
     [1] 3 2 2 1 3 1 1 2 2 3
    

    If we coerce the factor to a character vector first, then to a numeric one, we preserve the original information not R's internal representation

    > as.numeric(as.character(dat3$a))
     [1] 1 2 2 3 1 3 3 2 2 1
    

    If your data are like this second example, then you can't use the simple data.matrix() trick as that is the same as applying as.numeric() directly to the factor and as this second example shows, that doesn't preserve the original information.

    0 讨论(0)
  • 2020-12-01 21:03

    I know this is an older question, but I just had the same problem and may be it helps:

    In this case, your score column seems like it should not have become a factor column. That usually happens after read.table when it is a text column. Depending on which country you are from, may be you separate floats with a "," and not with a ".". Then R thinks that is a character column and makes it a factor. AND in that case Gavins answer won't work, because R won't make "123,456" to 123.456 . You can easily fix that in a text editor with replace "," with "." though.

    0 讨论(0)
提交回复
热议问题