问题
So let's take the following data.table. It has dates and a column of numbers. I'd like to get the week of each date and then aggregate (sum) of each two weeks.
Date <- as.Date(c("1980-01-01", "1980-01-02", "1981-01-05", "1981-01-05", "1982-01-08", "1982-01-15", "1980-01-16", "1980-01-17",
"1981-01-18", "1981-01-22", "1982-01-24", "1982-01-26"))
Runoff <- c(2, 1, 0.1, 3, 2, 5, 1.5, 0.5, 0.3, 2, 1.5, 4)
DT <- data.table(Date, Runoff)
DT
So from the date, I can easily get the year and week.
DT[,c("Date_YrWeek") := paste(substr(Date,1,4), week(Date), sep="-")][]
What I'm struggling with is aggregating with every two week. I thought that I'd get the first date for each week and filter using those values. Unfortunately, that would be pretty foolish.
DT[,.(min(Date)),by=.(Date_YrWeek)][order(Date)]
The final result would end up being the sum of every two weeks.
weeks sum_value
1 and 2 ...
3 and 4 ...
5 and 6 ...
Anyone have an efficient way to do this with data.table?
回答1:
1) Define the two week periods as starting from the minimum Date. Then we can get the total Runoff for each such period like this.
DT[, .(sum_value = sum(Runoff)),
keyby = .(Date = 14 * (as.numeric(Date - min(Date)) %/% 14) + min(Date))]
giving the following where the Date column is the date of the first day of the two week period.
Date sum_value
1: 1980-01-01 3.0
2: 1980-01-15 2.0
3: 1980-12-30 3.1
4: 1981-01-13 2.3
5: 1981-12-29 2.0
6: 1982-01-12 6.5
7: 1982-01-26 4.0
2) If you prefer the text shown in the question for the first column then:
DT[, .(sum_value = sum(Runoff)),
keyby = .(two_week = as.numeric(Date - min(Date)) %/% 14)][
, .(weeks = paste(2*two_week + 1, "and", 2*two_week + 2), sum_value)]
giving:
weeks sum_value
1: 1 and 2 3.0
2: 3 and 4 2.0
3: 53 and 54 3.1
4: 55 and 56 2.3
5: 105 and 106 2.0
6: 107 and 108 6.5
7: 109 and 110 4.0
Update: Revised and added (2).
回答2:
With tidyverse and lubridate:
library(tidyverse)
library(lubridate)
summary <- DT %>%
mutate(TwoWeeks = round_date(Date, "2 weeks")) %>%
group_by(TwoWeeks) %>%
summarise(sum_value = sum(Runoff))
summary
# A tibble: 9 × 2
TwoWeeks sum_value
<date> <dbl>
1 1979-12-30 3.0
2 1980-01-13 1.5
3 1980-01-20 0.5
4 1981-01-04 3.1
5 1981-01-18 0.3
6 1981-01-25 2.0
7 1982-01-10 2.0
8 1982-01-17 5.0
9 1982-01-24 5.5
Lubridate's round_date() will aggregate dates within ranges you can specify through size and unit, in this case, "2 weeks". round_date()'s output is the first calendar day of that period.
来源:https://stackoverflow.com/questions/45525386/data-table-aggregate-by-every-two-weeks