I have a dataframe and I would like to count the number of rows within each group. I reguarly use the aggregate
function to sum data as follows:
Following @Joshua's suggestion, here's one way you might count the number of observations in your df
dataframe where Year
= 2007 and Month
= Nov (assuming they are columns):
nrow(df[,df$YEAR == 2007 & df$Month == "Nov"])
and with aggregate
, following @GregSnow:
aggregate(x ~ Year + Month, data = df, FUN = length)
You can use by
functions as by(df1$Year, df1$Month, count)
that will produce a list of needed aggregation.
The output will look like,
df1$Month: Feb
x freq
1 2012 1
2 2013 1
3 2014 5
---------------------------------------------------------------
df1$Month: Jan
x freq
1 2012 5
2 2013 2
---------------------------------------------------------------
df1$Month: Mar
x freq
1 2012 1
2 2013 3
3 2014 2
>
The simple option to use with aggregate
is the length
function which will give you the length of the vector in the subset. Sometimes a little more robust is to use function(x) sum( !is.na(x) )
.
For my aggregations I usually end up wanting to see mean and "how big is this group" (a.k.a. length). So this is my handy snippet for those occasions;
agg.mean <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="mean")
agg.count <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="length")
aggcount <- agg.count$columnToMean
agg <- cbind(aggcount, agg.mean)
Create a new variable Count
with a value of 1 for each row:
df1["Count"] <-1
Then aggregate dataframe, summing by the Count
column:
df2 <- aggregate(df1[c("Count")], by=list(Year=df1$Year, Month=df1$Month), FUN=sum, na.rm=TRUE)
There are plenty of wonderful answers here already, but I wanted to throw in 1 more option for those wanting to add a new column to the original dataset that contains the number of times that row is repeated.
df1$counts <- sapply(X = paste(df1$Year, df1$Month),
FUN = function(x) { sum(paste(df1$Year, df1$Month) == x) })
The same could be accomplished by combining any of the above answers with the merge()
function.