I have a dataset filled with the average windspeed per hour for multiple years. I would like to create an \'average year\', in which for each hour the average windspeed for
It is pretty old post, but I wanted to add. I guess timeAverage in Openair can also be used. In the manual, there are more options for timeAverage function.
You can use substr
to extract the part of the date you want,
and then use tapply
or ddply
to aggregate the data.
tapply(
data.multipleyears$Windspeed,
substr( data.multipleyears$DATETIME, 6, 19),
mean
)
# 01-01 01:00:00 02-29 12:00:00 05-03 09:00:00
# 9 3 5
library(plyr)
ddply(
data.multipleyears,
.(when=substr(DATETIME, 6, 19)),
summarize,
Windspeed=mean(Windspeed)
)
# when Windspeed
# 1 01-01 01:00:00 9
# 2 02-29 12:00:00 3
# 3 05-03 09:00:00 5
I predict that ddply and the plyr package are going to be your best friend :). I created a 30 year dataset with hourly random windspeeds between 1 and 10 ms:
begin_date = as.POSIXlt("1990-01-01", tz = "GMT")
# 30 year dataset
dat = data.frame(dt = begin_date + (0:(24*30*365)) * (3600))
dat = within(dat, {
speed = runif(length(dt), 1, 10)
unique_day = strftime(dt, "%d-%m")
})
> head(dat)
dt unique_day speed
1 1990-01-01 00:00:00 01-01 7.054124
2 1990-01-01 01:00:00 01-01 2.202591
3 1990-01-01 02:00:00 01-01 4.111633
4 1990-01-01 03:00:00 01-01 2.687808
5 1990-01-01 04:00:00 01-01 8.643168
6 1990-01-01 05:00:00 01-01 5.499421
To calculate the daily normalen (30 year average, this term is much used in meteorology) over this 30 year period:
library(plyr)
res = ddply(dat, .(unique_day),
summarise, mean_speed = mean(speed), .progress = "text")
> head(res)
unique_day mean_speed
1 01-01 5.314061
2 01-02 5.677753
3 01-03 5.395054
4 01-04 5.236488
5 01-05 5.436896
6 01-06 5.544966
This takes just a few seconds on my humble two core AMD, so I suspect just going once through the data is not needed. Multiple of these ddply
calls for different aggregations (month, season etc) can be done separately.