I have data frame looking like this
V1 V2
.. 1
.. 2
.. 1
.. 3
etc.
For each distinct V2 value i would like to calculate v
And the old standby, tapply
:
dat <- data.frame(x = runif(50), y = rep(letters[1:5],each = 10))
tapply(dat$x,dat$y,FUN = var)
a b c d e
0.03907351 0.10197081 0.08036828 0.03075195 0.08289562
Another solution using data.table
. It is a lot faster, especially useful when you have large data sets.
require(data.table)
dat2 = data.table(dat)
ans = dat2[,list(variance = var(V1)),'V2']
There are a few ways to do this, I prefer:
dat <- data.frame(V1 = rnorm(50), V2=rep(1:5,10))
dat
aggregate (V1~V2, data=dat, var) # The first argument tells it to group V1 based on the values in V2, the last argument simply tells it the function to apply.
> aggregate (V1~V2, data=dat, var)
V2 V1
1 1 0.9139360
2 2 1.6222236
3 3 1.2429743
4 4 1.1889356
5 5 0.7000294
Also look into ddply, daply etc in the plyr package.
library(reshape)
ddply(data, .(V2), summarise, variance=var(V1))
Using dplyr
you can do
library(dplyr)
data %>%
group_by(V2) %>%
summarize(var = var(V1))
Here we group by the unique values of V2
and find the variance of V1
for each group.