Converting a data frame to a matrix with plyr daply

问题

I'm trying to use the daply function in the plyr package but I cannot get it to output properly. Even though the variable that makes up the matrix is numeric, the elements of the matrix are lists, not the variable itself. Here is a small subset of the data for example sake:

   Month Vehicle Samples
1 Oct-10   31057     256
2 Oct-10   31059     316
3 Oct-10   31060     348
4 Nov-10   31057     267
5 Nov-10   31059     293
6 Nov-10   31060     250
7 Dec-10   31057     159
8 Dec-10   31059     268
9 Dec-10   31060     206

And I would like to be able to visualize the data in a matrix format, which would look something like this:

  Month
Vehicle Oct-10 Nov-10 Dec-10
  31057    256    267    159
  31059    316    293    268
  31060    348    250    206

Here are a couple of alternative syntax that I use (the latter because my original dataframe has more columns than I show here):

daply(DF, .(Vehicle, Month), identity)
daply(DF,.(Vehicle,Month), colwise(identity,.(Samples)))

However what I get instead is rather abstruse:

       Month
Vehicle Oct-10 Nov-10 Dec-10
  31057 List,3 List,3 List,3
  31059 List,3 List,3 List,3
  31060 List,3 List,3 List,3

I used the str function on the output as some commenters have suggested, and here is an excerpt:

List of 9
 $ :'data.frame':       1 obs. of  3 variables:
  ..$ Month  : Ord.factor w/ 3 levels "Oct-10"<"Nov-10"<..: 1
  ..$ Vehicle: Factor w/ 3 levels "31057","31059",..: 1
  ..$ Samples: int 256
 $ :'data.frame':       1 obs. of  3 variables:
  ..$ Month  : Ord.factor w/ 3 levels "Oct-10"<"Nov-10"<..: 1
  ..$ Vehicle: Factor w/ 3 levels "31057","31059",..: 2
  ..$ Samples: int 316

What am I missing? Also, is there a way to do this simply with the base packages? Thanks!

Below is the Dput of the data frame if you'd like to reproduce this:

structure(list(Month = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L), .Label = c("Oct-10", "Nov-10", "Dec-10"), class = c("ordered", 
"factor")), Vehicle = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L), .Label = c("31057", "31059", "31060"), class = "factor"), 
    Samples = c(256L, 316L, 348L, 267L, 293L, 250L, 159L, 268L, 
    206L)), .Names = c("Month", "Vehicle", "Samples"), class = "data.frame", row.names = c(NA, 
9L))

回答1:

The identity function isn't what you want here; from the help page, "All plyr functions use the same split-apply-combine strategy: they split the input into simpler pieces, apply .fun to each piece, and then combine the pieces into a single data structure." The simpler pieces in this case are subsets of the original data frame with unique Vehicle/Month combinations; the identity function just returns that subset, and these subsets are then used to fill the resulting matrix.

That is, each element of the matrix you got is a data frame (which is a type of list) with the rows with that Month/Vehicle combination.

> try1 <- daply(DF, .(Vehicle, Month), identity)
> try1[1,1]
[[1]]
   Month Vehicle Samples
1 Oct-10   31057     256

You instead want to use a function that just gets the Samples portion of that data frame, like this:

daply(DF, .(Vehicle, Month), function(x) x$Samples)

which results in

       Month
Vehicle Oct-10 Nov-10 Dec-10
  31057    256    267    159
  31059    316    293    268
  31060    348    250    206

A few alternate ways of doing this are with cast from the reshape package (which returns a data frame)

cast(DF, Vehicle~Month, value="Samples")

the revised version in reshape2; the first returns a data frame, the second a matrix

dcast(DF, Vehicle~Month, value_var="Samples")
acast(DF, Vehicle~Month, value_var="Samples")

with xtabs from the stats package

xtabs(Samples ~ Vehicle + Month, DF)

or by hand, which isn't hard at all using matrix indexing; almost all the code is just setting up the matrix.

with(DF, {
  out <- matrix(nrow=nlevels(Vehicle), ncol=nlevels(Month),
                dimnames=list(Vehicle=levels(Vehicle), Month=levels(Month)))
  out[cbind(Vehicle, Month)] <- Samples
  out
})

The reshape function in the stats package can also be used to do this, but the syntax is difficult and I haven't used it once since learning cast and melt from the reshape package.

回答2:

If we take the OP at their word(s) in the title, then they may be looking for data.matrix() which is a standard function in the base package that is always available in R.

data.matrix() works by converting any factors to their numeric coding before converting the data frame to a matrix. Consider the following data frame:

dat <- data.frame(A = 1:10, B = factor(sample(c("X","Y"), 10, replace = TRUE)))

If we convert via as.matrix() we get a character matrix:

> head(as.matrix(dat))
     A    B  
[1,] " 1" "X"
[2,] " 2" "X"
[3,] " 3" "Y"
[4,] " 4" "Y"
[5,] " 5" "Y"
[6,] " 6" "Y"

or if via matrix() one gets a list with dimensions (a list array - as mentioned in the Value section of ?daply by the way)

> head(matrix(dat))
     [,1]      
[1,] Integer,10
[2,] factor,10 
> str(matrix(dat))
List of 2
 $ : int [1:10] 1 2 3 4 5 6 7 8 9 10
 $ : Factor w/ 2 levels "X","Y": 1 1 2 2 2 2 1 2 2 1
 - attr(*, "dim")= int [1:2] 2 1

data.matrix(), however, does the intended thing:

> mat <- data.matrix(dat)
> head(mat)
     A B
[1,] 1 1
[2,] 2 1
[3,] 3 2
[4,] 4 2
[5,] 5 2
[6,] 6 2
> str(mat)
 int [1:10, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "A" "B"

来源：https://stackoverflow.com/questions/7006082/converting-a-data-frame-to-a-matrix-with-plyr-daply

标签

dataframe

plyr