I\'m trying to use R\'s by
command to get column means for subsets of a data frame. For example, consider this data frame:
> z = data.frame(
Dealing with the by output can be really annoying. I just found a way to withdraw what you want in a format of a data frame and you won't need extra packages.
So, if you do this:
aux <- by(z[,2:5],z$labels,colMeans)
You can then transform it in a data frame by doing this:
aux_df <- as.data.frame(t(aux[seq(nrow(aux)),seq(ncol(aux))]))
I'm just getting all the rows and columns from aux, transposing it and using as.data.frame.
I hope that helps.
You can use ddply
from plyr
package
library(plyr)
ddply(z, .(labels), numcolwise(mean))
labels data.1 data.2 data.3 data.4
1 a 1.5 6.5 11.5 16.5
2 b 3.0 8.0 13.0 18.0
3 c 4.5 9.5 14.5 19.5
Or aggregate
from stats
aggregate(z[,-1], by=list(z$labels), mean)
Group.1 data.1 data.2 data.3 data.4
1 a 1.5 6.5 11.5 16.5
2 b 3.0 8.0 13.0 18.0
3 c 4.5 9.5 14.5 19.5
Or dcast
from reshape2
package
library(reshape2)
dcast( melt(z), labels ~ variable, mean)
Using sapply
:
t(sapply(split(z[,-1], z$labels), colMeans))
data.1 data.2 data.3 data.4
a 1.5 6.5 11.5 16.5
b 3.0 8.0 13.0 18.0
c 4.5 9.5 14.5 19.5
The output of by
is a list
so you can use do.call
to rbind
them and then convert this:
as.data.frame(do.call("rbind",by(z[,2:5],z$labels,colMeans)))
data.1 data.2 data.3 data.4
a 1.5 6.5 11.5 16.5
b 3.0 8.0 13.0 18.0
c 4.5 9.5 14.5 19.5