问题
I wanted to tabulate data so that a factor variable becomes columns and keep value from another variable in cell.
So I tried,
a=rep(1:3,3)
d<-rep(1:3, each=3)
b=rnorm(9)
c=runif(9)
dt<-data.frame(a,d,b,c)
a d b c
1 1 1 0.3819762 0.5199602
2 2 1 0.3896063 0.9144730
3 3 1 2.4356972 0.2888464
4 1 2 1.2697016 0.9831191
5 2 2 -1.9844689 0.2046947
6 3 2 0.3473766 0.4766178
7 1 3 -1.5461235 0.6187189
8 2 3 1.0829027 0.9089551
9 3 3 -0.1305324 0.6326141
I looked for data.table
, plyr
, reshape2
but could not find what I wanted to do. So, I did the old loop way.
mat<-matrix(NA, nrow=3, ncol=4)
for (i in 1:3){
mat[i,1]<-i
for (j in 1:3){
val=dt[a==i & d==j,3]
mat[i,j+1]<-val
}
}
mat
[,1] [,2] [,3] [,4]
[1,] 1 0.3819762 1.2697016 -1.5461235
[2,] 2 0.3896063 -1.9844689 1.0829027
[3,] 3 2.4356972 0.3473766 -0.1305324
... and it takes forever for big data.
Any better option??
回答1:
This can be done in base R also:
reshape(dt,timevar="d",idvar="a",drop="c",direction="wide")
For your data, this gives...
a b.1 b.2 b.3
1 1 0.3819762 1.2697016 -1.5461235
2 2 0.3896063 -1.9844689 1.0829027
3 3 2.4356972 0.3473766 -0.1305324
Please use set.seed
before drawing simulated data, so that it is easier to reproduce.
I don't know that this solution will be fast. Also, to use it in the future, you have to get used to these confusing argument names ("timevar", "idvar", etc.) which probably don't describe what you're actually doing most of the time...
回答2:
Here's a data.table
option:
library(data.table)
dt = data.table(dt)
dt[, as.list(b), by = a]
回答3:
using reshape2
> library(reshape2)
> dcast(dt, a ~ d, value.var = "b")
a 1 2 3
1 1 0.3819762 1.2697016 -1.5461235
2 2 0.3896063 -1.9844689 1.0829027
3 3 2.4356972 0.3473766 -0.1305324
来源:https://stackoverflow.com/questions/16819360/tabulate-a-data-frame-in-r