Count number of rows within each group

后端 未结 14 2545
夕颜
夕颜 2020-11-21 05:01

I have a dataframe and I would like to count the number of rows within each group. I reguarly use the aggregate function to sum data as follows:



        
相关标签:
14条回答
  • 2020-11-21 05:47

    Following @Joshua's suggestion, here's one way you might count the number of observations in your df dataframe where Year = 2007 and Month = Nov (assuming they are columns):

    nrow(df[,df$YEAR == 2007 & df$Month == "Nov"])
    

    and with aggregate, following @GregSnow:

    aggregate(x ~ Year + Month, data = df, FUN = length)
    
    0 讨论(0)
  • 2020-11-21 05:49

    You can use by functions as by(df1$Year, df1$Month, count) that will produce a list of needed aggregation.

    The output will look like,

    df1$Month: Feb
         x freq
    1 2012    1
    2 2013    1
    3 2014    5
    --------------------------------------------------------------- 
    df1$Month: Jan
         x freq
    1 2012    5
    2 2013    2
    --------------------------------------------------------------- 
    df1$Month: Mar
         x freq
    1 2012    1
    2 2013    3
    3 2014    2
    > 
    
    0 讨论(0)
  • 2020-11-21 05:50

    The simple option to use with aggregate is the length function which will give you the length of the vector in the subset. Sometimes a little more robust is to use function(x) sum( !is.na(x) ).

    0 讨论(0)
  • 2020-11-21 05:50

    For my aggregations I usually end up wanting to see mean and "how big is this group" (a.k.a. length). So this is my handy snippet for those occasions;

    agg.mean <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="mean")
    agg.count <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="length")
    aggcount <- agg.count$columnToMean
    agg <- cbind(aggcount, agg.mean)
    
    0 讨论(0)
  • 2020-11-21 05:51

    Create a new variable Count with a value of 1 for each row:

    df1["Count"] <-1
    

    Then aggregate dataframe, summing by the Count column:

    df2 <- aggregate(df1[c("Count")], by=list(Year=df1$Year, Month=df1$Month), FUN=sum, na.rm=TRUE)
    
    0 讨论(0)
  • 2020-11-21 05:53

    There are plenty of wonderful answers here already, but I wanted to throw in 1 more option for those wanting to add a new column to the original dataset that contains the number of times that row is repeated.

    df1$counts <- sapply(X = paste(df1$Year, df1$Month), 
                         FUN = function(x) { sum(paste(df1$Year, df1$Month) == x) })
    

    The same could be accomplished by combining any of the above answers with the merge() function.

    0 讨论(0)
提交回复
热议问题