Calculate average monthly total by groups from data.table in R

后端 未结 3 1969
暖寄归人
暖寄归人 2020-12-30 07:29

I have a data.table with a row for each day over a 30 year period with a number of different variable columns. The reason for using data.table is that the .csv file I\'m usi

相关标签:
3条回答
  • 2020-12-30 08:02

    Since you said in your question that you would be open to a completely new solution, you could try the following with dplyr:

    df$Date <- as.Date(df$Date, format="%Y-%m-%d")
    df$Year.Month <- format(df$Date, '%Y-%m')
    df$Month <- format(df$Date, '%m')
    
    require(dplyr)
    
    df %>%
      group_by(Key, Year.Month, Month) %>%
      summarize(Runoff = sum(Runoff)) %>%
      ungroup() %>%
      group_by(Key, Month) %>%
      summarize(mean(Runoff))
    

    EDIT #1 after comment by @Henrik: The same can be done by:

    df %>%
      group_by(Key, Month, Year.Month) %>%
      summarize(Runoff = sum(Runoff)) %>%
      summarize(mean(Runoff))
    

    EDIT #2 to round things up: This is another way of doing it (the second grouping is more explicit this way) thanks to @Henrik for his comments

    df %>%
      group_by(Key, Month, Year.Month) %>%
      summarize(Runoff = sum(Runoff)) %>%
      group_by(Key, Month, add = FALSE) %>%    #now grouping by Key and Month, but not Year.Month
      summarize(mean(Runoff))
    

    It produces the following result:

    #Source: local data frame [2 x 3]
    #Groups: Key
    #
    #  Key Month mean(Runoff)
    #1   A    01     4.366667
    #2   B    01     3.266667
    

    You can then reshape the output to match your desired output using e.g. reshape2. Suppose you stored the output of the above operation in a data.frame df2, then you could do:

    require(reshape2)
    
    df2 <- dcast(df2, Key  ~ Month, sum, value.var = "mean(Runoff)")
    
    0 讨论(0)
  • 2020-12-30 08:09

    If you're not looking for complicated functions and just want the mean, then the following should suffice:

    DT[, sum(Runoff) / length(unique(year(Date))), list(Key, month(Date))]
    #   Key month       V1
    #1:   A     1 4.366667
    #2:   B     1 3.266667
    
    0 讨论(0)
  • 2020-12-30 08:11

    They only way I could think of doing it was in two steps. Probably not the best way, but here goes

    DT[, c("YM", "Month") := list(substr(Date, 1, 7), substr(Date, 6, 7))]
    DT[, Runoff2 := sum(Runoff), by = c("Key", "YM")]
    DT[, mean(Runoff2), by = c("Key", "Month")]
    
    ##   Key Month       V1
    ## 1:   A    01 4.366667
    ## 2:   B    01 3.266667
    

    Just to show another (very similar) way:

    DT[, c("year", "month") := list(year(Date), month(Date))]
    DT[, Runoff2 := sum(Runoff), by=list(Key, year, month)]
    DT[, mean(Runoff2), by=list(Key, month)]
    

    Note that you don't have to create new columns, as by supports expressions as well. That is, you can directly use them in by as follows:

    DT[, Runoff2 := sum(Runoff), by=list(Key, year = year(Date), month = month(Date))]
    

    But since you require to aggregate more than once, it's better (for speed) to store them as additional columns, as @David has shown here.

    0 讨论(0)
提交回复
热议问题