I\'m trying to use the daply
function in the plyr
package but I cannot get it to output properly. Even though the variable that makes up the matrix is
If we take the OP at their word(s) in the title, then they may be looking for data.matrix()
which is a standard function in the base package that is always available in R.
data.matrix()
works by converting any factors to their numeric coding before converting the data frame to a matrix. Consider the following data frame:
dat <- data.frame(A = 1:10, B = factor(sample(c("X","Y"), 10, replace = TRUE)))
If we convert via as.matrix()
we get a character matrix:
> head(as.matrix(dat))
A B
[1,] " 1" "X"
[2,] " 2" "X"
[3,] " 3" "Y"
[4,] " 4" "Y"
[5,] " 5" "Y"
[6,] " 6" "Y"
or if via matrix()
one gets a list with dimensions (a list array - as mentioned in the Value section of ?daply
by the way)
> head(matrix(dat))
[,1]
[1,] Integer,10
[2,] factor,10
> str(matrix(dat))
List of 2
$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
$ : Factor w/ 2 levels "X","Y": 1 1 2 2 2 2 1 2 2 1
- attr(*, "dim")= int [1:2] 2 1
data.matrix()
, however, does the intended thing:
> mat <- data.matrix(dat)
> head(mat)
A B
[1,] 1 1
[2,] 2 1
[3,] 3 2
[4,] 4 2
[5,] 5 2
[6,] 6 2
> str(mat)
int [1:10, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "A" "B"
The identity
function isn't what you want here; from the help page, "All plyr functions use the same split-apply-combine strategy: they split the input into simpler pieces, apply .fun to each piece, and then combine the pieces into a single data structure." The simpler pieces in this case are subsets of the original data frame with unique Vehicle/Month combinations; the identity function just returns that subset, and these subsets are then used to fill the resulting matrix.
That is, each element of the matrix you got is a data frame (which is a type of list) with the rows with that Month/Vehicle combination.
> try1 <- daply(DF, .(Vehicle, Month), identity)
> try1[1,1]
[[1]]
Month Vehicle Samples
1 Oct-10 31057 256
You instead want to use a function that just gets the Samples
portion of that data frame, like this:
daply(DF, .(Vehicle, Month), function(x) x$Samples)
which results in
Month
Vehicle Oct-10 Nov-10 Dec-10
31057 256 267 159
31059 316 293 268
31060 348 250 206
A few alternate ways of doing this are with cast
from the reshape
package (which returns a data frame)
cast(DF, Vehicle~Month, value="Samples")
the revised version in reshape2
; the first returns a data frame, the second a matrix
dcast(DF, Vehicle~Month, value_var="Samples")
acast(DF, Vehicle~Month, value_var="Samples")
with xtabs
from the stats
package
xtabs(Samples ~ Vehicle + Month, DF)
or by hand, which isn't hard at all using matrix indexing; almost all the code is just setting up the matrix.
with(DF, {
out <- matrix(nrow=nlevels(Vehicle), ncol=nlevels(Month),
dimnames=list(Vehicle=levels(Vehicle), Month=levels(Month)))
out[cbind(Vehicle, Month)] <- Samples
out
})
The reshape
function in the stats package can also be used to do this, but the syntax is difficult and I haven't used it once since learning cast
and melt
from the reshape
package.