I have some simple commands looking into totals, means and maximums of a variable whilst another variable is an assigned value:
sum(data[data$var1==1,]$var2)
mea
You need to use cbind
for that.
cbind(sum(data[data$var1==1,]$var2),mean(data[data$var1==1,]$var2),max(data[data$var1==1,]$var2))
Example using mtcars data
mydata<-mtcars
l<-cbind(sum(mydata[mydata$cyl==4,]$mpg),mean(mydata[mydata$cyl==4,]$mpg),max(mydata[mydata$cyl==4,]$mpg))
l<-data.frame(l)
names(l)<-c("sum","mean","max")
> l
sum mean max
1 293.3 26.66364 33.9
There is a ddply
function from plyr package that does all for each categories of var1 (here cyl)
library(plyr)
ddply(mydata,.(cyl),summarize, sum=sum(mpg),mean=mean(mpg), max=max(mpg))
ddply(mydata,.(cyl),summarize, sum=sum(mpg),mean=mean(mpg), max=max(mpg))
cyl sum mean max
1 4 293.3 26.66364 33.9
2 6 138.2 19.74286 21.4
3 8 211.4 15.10000 19.2
I recommend checking out the data.table
package, which is like a beefed-up version of data frames. One thing it does really well (and quickly, if you have a lot of data) is summaries like this.
library(data.table)
as.data.table(mtcars)[, list(sum=sum(mpg), mean=mean(mpg), max=max(mpg)),
by=cyl][order(cyl)]
# cyl sum mean max
#1: 4 293.3 26.66364 33.9
#2: 6 138.2 19.74286 21.4
#3: 8 211.4 15.10000 19.2
If you want to summarize by more than one variable, just use something like by=list(cyl,vs,otherColumnNamesHere)
.
Look at the tables package, read through the vignette for the package shows how to do exactly what you are asking for.
> tabular( ( factor(cyl) + 1) ~ mpg * (sum + mean + max), data=mtcars )
mpg
factor(cyl) sum mean max
4 293.3 26.66 33.9
6 138.2 19.74 21.4
8 211.4 15.10 19.2
All 642.9 20.09 33.9