Converting a data frame to a matrix with plyr daply

后端 未结 2 1476
花落未央
花落未央 2021-02-09 11:18

I\'m trying to use the daply function in the plyr package but I cannot get it to output properly. Even though the variable that makes up the matrix is

相关标签:
2条回答
  • 2021-02-09 12:10

    If we take the OP at their word(s) in the title, then they may be looking for data.matrix() which is a standard function in the base package that is always available in R.

    data.matrix() works by converting any factors to their numeric coding before converting the data frame to a matrix. Consider the following data frame:

    dat <- data.frame(A = 1:10, B = factor(sample(c("X","Y"), 10, replace = TRUE)))
    

    If we convert via as.matrix() we get a character matrix:

    > head(as.matrix(dat))
         A    B  
    [1,] " 1" "X"
    [2,] " 2" "X"
    [3,] " 3" "Y"
    [4,] " 4" "Y"
    [5,] " 5" "Y"
    [6,] " 6" "Y"
    

    or if via matrix() one gets a list with dimensions (a list array - as mentioned in the Value section of ?daply by the way)

    > head(matrix(dat))
         [,1]      
    [1,] Integer,10
    [2,] factor,10 
    > str(matrix(dat))
    List of 2
     $ : int [1:10] 1 2 3 4 5 6 7 8 9 10
     $ : Factor w/ 2 levels "X","Y": 1 1 2 2 2 2 1 2 2 1
     - attr(*, "dim")= int [1:2] 2 1
    

    data.matrix(), however, does the intended thing:

    > mat <- data.matrix(dat)
    > head(mat)
         A B
    [1,] 1 1
    [2,] 2 1
    [3,] 3 2
    [4,] 4 2
    [5,] 5 2
    [6,] 6 2
    > str(mat)
     int [1:10, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
     - attr(*, "dimnames")=List of 2
      ..$ : NULL
      ..$ : chr [1:2] "A" "B"
    
    0 讨论(0)
  • 2021-02-09 12:12

    The identity function isn't what you want here; from the help page, "All plyr functions use the same split-apply-combine strategy: they split the input into simpler pieces, apply .fun to each piece, and then combine the pieces into a single data structure." The simpler pieces in this case are subsets of the original data frame with unique Vehicle/Month combinations; the identity function just returns that subset, and these subsets are then used to fill the resulting matrix.

    That is, each element of the matrix you got is a data frame (which is a type of list) with the rows with that Month/Vehicle combination.

    > try1 <- daply(DF, .(Vehicle, Month), identity)
    > try1[1,1]
    [[1]]
       Month Vehicle Samples
    1 Oct-10   31057     256
    

    You instead want to use a function that just gets the Samples portion of that data frame, like this:

    daply(DF, .(Vehicle, Month), function(x) x$Samples)
    

    which results in

           Month
    Vehicle Oct-10 Nov-10 Dec-10
      31057    256    267    159
      31059    316    293    268
      31060    348    250    206
    

    A few alternate ways of doing this are with cast from the reshape package (which returns a data frame)

    cast(DF, Vehicle~Month, value="Samples")
    

    the revised version in reshape2; the first returns a data frame, the second a matrix

    dcast(DF, Vehicle~Month, value_var="Samples")
    acast(DF, Vehicle~Month, value_var="Samples")
    

    with xtabs from the stats package

    xtabs(Samples ~ Vehicle + Month, DF)
    

    or by hand, which isn't hard at all using matrix indexing; almost all the code is just setting up the matrix.

    with(DF, {
      out <- matrix(nrow=nlevels(Vehicle), ncol=nlevels(Month),
                    dimnames=list(Vehicle=levels(Vehicle), Month=levels(Month)))
      out[cbind(Vehicle, Month)] <- Samples
      out
    })
    

    The reshape function in the stats package can also be used to do this, but the syntax is difficult and I haven't used it once since learning cast and melt from the reshape package.

    0 讨论(0)
提交回复
热议问题