I have a data.table with a row for each day over a 30 year period with a number of different variable columns. The reason for using data.table is that the .csv file I\'m usi
Since you said in your question that you would be open to a completely new solution, you could try the following with dplyr
:
df$Date <- as.Date(df$Date, format="%Y-%m-%d")
df$Year.Month <- format(df$Date, '%Y-%m')
df$Month <- format(df$Date, '%m')
require(dplyr)
df %>%
group_by(Key, Year.Month, Month) %>%
summarize(Runoff = sum(Runoff)) %>%
ungroup() %>%
group_by(Key, Month) %>%
summarize(mean(Runoff))
EDIT #1 after comment by @Henrik: The same can be done by:
df %>%
group_by(Key, Month, Year.Month) %>%
summarize(Runoff = sum(Runoff)) %>%
summarize(mean(Runoff))
EDIT #2 to round things up: This is another way of doing it (the second grouping is more explicit this way) thanks to @Henrik for his comments
df %>%
group_by(Key, Month, Year.Month) %>%
summarize(Runoff = sum(Runoff)) %>%
group_by(Key, Month, add = FALSE) %>% #now grouping by Key and Month, but not Year.Month
summarize(mean(Runoff))
It produces the following result:
#Source: local data frame [2 x 3]
#Groups: Key
#
# Key Month mean(Runoff)
#1 A 01 4.366667
#2 B 01 3.266667
You can then reshape the output to match your desired output using e.g. reshape2
. Suppose you stored the output of the above operation in a data.frame df2
, then you could do:
require(reshape2)
df2 <- dcast(df2, Key ~ Month, sum, value.var = "mean(Runoff)")
If you're not looking for complicated functions and just want the mean, then the following should suffice:
DT[, sum(Runoff) / length(unique(year(Date))), list(Key, month(Date))]
# Key month V1
#1: A 1 4.366667
#2: B 1 3.266667
They only way I could think of doing it was in two steps. Probably not the best way, but here goes
DT[, c("YM", "Month") := list(substr(Date, 1, 7), substr(Date, 6, 7))]
DT[, Runoff2 := sum(Runoff), by = c("Key", "YM")]
DT[, mean(Runoff2), by = c("Key", "Month")]
## Key Month V1
## 1: A 01 4.366667
## 2: B 01 3.266667
Just to show another (very similar) way:
DT[, c("year", "month") := list(year(Date), month(Date))]
DT[, Runoff2 := sum(Runoff), by=list(Key, year, month)]
DT[, mean(Runoff2), by=list(Key, month)]
Note that you don't have to create new columns, as by
supports expressions as well. That is, you can directly use them in by
as follows:
DT[, Runoff2 := sum(Runoff), by=list(Key, year = year(Date), month = month(Date))]
But since you require to aggregate more than once, it's better (for speed) to store them as additional columns, as @David has shown here.