问题
I need to calculate seasonal averages for my data every year, the calculation of average is not in the same calendar year. I have defined season by date and am looking to calculate average temperature, precipitation etc for that time period every year (eg 12/21/1981
to 02/15/1982
, 12/21/1982
to 02/15/1983
) and so on.
Is there an efficient way of doing this in R?
Below is my data:
library(xts)
seq <- timeBasedSeq('1981-01-01/1985-06-30')
Data <- xts(1:length(seq),seq)
Thanks
回答1:
If we push the time forward by 11 days then the dates we want are those at or before February 26th so let tt
be such a date vector and ok
be a logical vector which is TRUE if the corresponding tt
element is at or before Febrary 26th. Finally aggregate Data[ok]
by end of period year.
tt <- time(Data) + 11
ok <- format(tt, "%m-%d") < "02-26"
aggregate(Data[ok], as.integer(as.yearmon(tt))[ok], mean)
giving:
1981 23.0
1982 382.5
1983 747.5
1984 1112.5
1985 1478.5
If you want to do it without xts then assume our input is DF
try this:
DF <- fortify.zoo(Data) # input
tt <- DF[, 1] + 11
ok <- format(tt, "%m-%d") < "02-26"
year <- as.numeric(format(tt, "%Y"))
aggregate(DF[ok, -1, drop = FALSE], list(year = year[ok]), mean)
回答2:
Here's a data-frame centered approach using tidyverse grammar (which could be translated to base R, if you prefer):
library(tidyverse)
df_in <- tibble(
date = seq(as.Date('1981-01-01'), as.Date('1985-06-30'), by = 'day'),
x = seq_along(date)
)
str(df_in)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 1642 obs. of 2 variables:
#> $ date: Date, format: "1981-01-01" "1981-01-02" ...
#> $ x : int 1 2 3 4 5 6 7 8 9 10 ...
df_out <- df_in %>%
# reformat data to keep months and days, but use identical year, so...
mutate(same_year = as.Date(format(date, '1970-%m-%d'))) %>%
# ...we can subset to rows we care about with simpler logic
filter(same_year < as.Date('1970-02-15') | same_year > as.Date('1970-12-21')) %>%
# shift so all in one year and use for grouping
group_by(run = as.integer(format(date - 60, '%Y'))) %>%
summarise( # aggregate each gruop
start_date = min(date),
end_date = max(date),
mean_x = mean(x)
)
df_out
#> # A tibble: 5 x 4
#> run start_date end_date mean_x
#> <int> <date> <date> <dbl>
#> 1 1980 1981-01-01 1981-02-14 23
#> 2 1981 1981-12-22 1982-02-14 383
#> 3 1982 1982-12-22 1983-02-14 748
#> 4 1983 1983-12-22 1984-02-14 1113
#> 5 1984 1984-12-22 1985-02-14 1479
来源:https://stackoverflow.com/questions/54413176/calculating-average-for-certain-time-period-in-every-year