Why does as.factor return a character when used inside apply?

后端 未结 1 520
别跟我提以往
别跟我提以往 2020-11-28 09:09

I want to convert variables into factors using apply():

a <- data.frame(x1 = rnorm(100),
                x2 = sample(c(\"a\",\"b\"), 100, rep         


        
相关标签:
1条回答
  • 2020-11-28 09:28

    apply converts your data.frame to a character matrix. Use lapply:

    lapply(a, class)
    # $x1
    # [1] "numeric"
    # $x2
    # [1] "factor"
    # $x3
    # [1] "factor"
    

    In second command apply converts result to character matrix, using lapply:

    a2 <- lapply(a, as.factor)
    lapply(a2, class)
    # $x1
    # [1] "factor"
    # $x2
    # [1] "factor"
    # $x3
    # [1] "factor"
    

    But for simple lookout you could use str:

    str(a)
    # 'data.frame':   100 obs. of  3 variables:
    #  $ x1: num  -1.79 -1.091 1.307 1.142 -0.972 ...
    #  $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
    #  $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
    

    Additional explanation according to comments:

    Why does the lapply work while apply doesn't?

    The first thing that apply does is to convert an argument to a matrix. So apply(a) is equivalent to apply(as.matrix(a)). As you can see str(as.matrix(a)) gives you:

    chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
    - attr(*, "dimnames")=List of 2
      ..$ : NULL
      ..$ : chr [1:3] "x1" "x2" "x3"
    

    There are no more factors, so class return "character" for all columns.
    lapply works on columns so gives you what you want (it does something like class(a$column_name) for each column).

    You can see in help to apply why apply and as.factor doesn't work :

    In all cases the result is coerced by as.vector to one of the basic vector types before the dimensions are set, so that (for example) factor results will be coerced to a character array.

    Why sapply and as.factor doesn't work you can see in help to sapply:

    Value (...) An atomic vector or matrix or list of the same length as X (...) If simplification occurs, the output type is determined from the highest type of the return values in the hierarchy NULL < raw < logical < integer < real < complex < character < list < expression, after coercion of pairlists to lists.

    You never get matrix of factors or data.frame.

    How to convert output to data.frame?

    Simple, use as.data.frame as you wrote in comment:

    a2 <- as.data.frame(lapply(a, as.factor))
    str(a2)
    'data.frame':   100 obs. of  3 variables:
     $ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
     $ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
     $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
    

    But if you want to replace selected character columns with factor there is a trick:

    a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
    str(a3)
    'data.frame':   26 obs. of  3 variables:
     $ x1: chr  "a" "b" "c" "d" ...
     $ x2: chr  "A" "B" "C" "D" ...
     $ x3: chr  "A" "B" "C" "D" ...
    
    columns_to_change <- c("x1","x2")
    a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
    str(a3)
    'data.frame':   26 obs. of  3 variables:
     $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ x3: chr  "A" "B" "C" "D" ...
    

    You could use it to replace all columns using:

    a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
    a3[, ] <- lapply(a3, as.factor)
    str(a3)
    'data.frame':   26 obs. of  3 variables:
     $ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
    
    0 讨论(0)
提交回复
热议问题