Calculate average monthly total by groups from data.table in R

后端未结

关注

 3  1972

I have a data.table with a row for each day over a 30 year period with a number of different variable columns. The reason for using data.table is that the .csv file I\'m usi

相关标签:

3条回答

抹茶落季

2020-12-30 08:02

Since you said in your question that you would be open to a completely new solution, you could try the following with dplyr:

df$Date <- as.Date(df$Date, format="%Y-%m-%d")
df$Year.Month <- format(df$Date, '%Y-%m')
df$Month <- format(df$Date, '%m')

require(dplyr)

df %>%
  group_by(Key, Year.Month, Month) %>%
  summarize(Runoff = sum(Runoff)) %>%
  ungroup() %>%
  group_by(Key, Month) %>%
  summarize(mean(Runoff))

EDIT #1 after comment by @Henrik: The same can be done by:

df %>%
  group_by(Key, Month, Year.Month) %>%
  summarize(Runoff = sum(Runoff)) %>%
  summarize(mean(Runoff))

EDIT #2 to round things up: This is another way of doing it (the second grouping is more explicit this way) thanks to @Henrik for his comments

df %>%
  group_by(Key, Month, Year.Month) %>%
  summarize(Runoff = sum(Runoff)) %>%
  group_by(Key, Month, add = FALSE) %>%    #now grouping by Key and Month, but not Year.Month
  summarize(mean(Runoff))

It produces the following result:

#Source: local data frame [2 x 3]
#Groups: Key
#
#  Key Month mean(Runoff)
#1   A    01     4.366667
#2   B    01     3.266667

You can then reshape the output to match your desired output using e.g. reshape2. Suppose you stored the output of the above operation in a data.frame df2, then you could do:

require(reshape2)

df2 <- dcast(df2, Key  ~ Month, sum, value.var = "mean(Runoff)")

0 讨论(0)

-上瘾入骨i

2020-12-30 08:09
If you're not looking for complicated functions and just want the mean, then the following should suffice:
```
DT[, sum(Runoff) / length(unique(year(Date))), list(Key, month(Date))]
#   Key month       V1
#1:   A     1 4.366667
#2:   B     1 3.266667
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
爱一瞬间的悲伤

2020-12-30 08:11
They only way I could think of doing it was in two steps. Probably not the best way, but here goes
```
DT[, c("YM", "Month") := list(substr(Date, 1, 7), substr(Date, 6, 7))]
DT[, Runoff2 := sum(Runoff), by = c("Key", "YM")]
DT[, mean(Runoff2), by = c("Key", "Month")]

##   Key Month       V1
## 1:   A    01 4.366667
## 2:   B    01 3.266667
```
Just to show another (very similar) way:
```
DT[, c("year", "month") := list(year(Date), month(Date))]
DT[, Runoff2 := sum(Runoff), by=list(Key, year, month)]
DT[, mean(Runoff2), by=list(Key, month)]
```
Note that you don't have to create new columns, as by supports expressions as well. That is, you can directly use them in by as follows:
```
DT[, Runoff2 := sum(Runoff), by=list(Key, year = year(Date), month = month(Date))]
```
But since you require to aggregate more than once, it's better (for speed) to store them as additional columns, as @David has shown here.
0 讨论(0)
发布评论:

提交评论
- 加载中...