问题
I have a dataset with precipitation records for every minute, for 6 different stations. I'd like to have summations for every 5 minutes, for every station. These are the first 5 rows of my dataset (in total I have 17280 rows):
P_alex P_hvh P_merlijn P_pascal P_thurlede P_tosca date
0 0 0 0 0 0 2011-06-27 22:00:00
0 1 5 2 0 0 2011-06-27 22:01:00
0 0 0 0 0 0 2011-06-27 22:02:00
0 6 2 3 0 0 2011-06-27 22:03:00
0 0 0 0 0 0 2011-06-27 22:04:00
I tried to find help on the internet, but I can not find an answer that helps me.
I also needed houlry sums, for that I use the following code, but this code is useless if you want to make other summations
uur_alex = tapply(disdro$P_alex, as.POSIXct(trunc(disdro$date, "hour")), sum)
Now I would like a code I could use to make different summations, so for 5 minutes (as in the question), but also for half an hour. I hope somebody can help me.
回答1:
you can use rollapply
from the zoo
package to achieve this. For example,
require(zoo)
tester <- data.frame(x=1:100,y=1:100)
output <- rollapply(tester,5,(sum),by=5,by.column=TRUE,align='right')
回答2:
cut
works very nicely with date-time objects, and thus, can be used to create the 5 minute intervals you are hoping to aggregate over. Here's an example:
First, some sample data:
set.seed(1)
mydf <- data.frame(P_alex = sample(0:5, 40, replace = TRUE),
P_hvh = sample(0:3, 40, replace = TRUE),
date = as.POSIXct("2011-06-27 22:00:00") + 60 * 0:39)
list(head(mydf), tail(mydf))
# [[1]]
# P_alex P_hvh date
# 1 1 3 2011-06-27 22:00:00
# 2 2 2 2011-06-27 22:01:00
# 3 3 3 2011-06-27 22:02:00
# 4 5 2 2011-06-27 22:03:00
# 5 1 2 2011-06-27 22:04:00
# 6 5 3 2011-06-27 22:05:00
#
# [[2]]
# P_alex P_hvh date
# 35 4 1 2011-06-27 22:34:00
# 36 4 3 2011-06-27 22:35:00
# 37 4 3 2011-06-27 22:36:00
# 38 0 1 2011-06-27 22:37:00
# 39 4 3 2011-06-27 22:38:00
# 40 2 3 2011-06-27 22:39:00
Now, perform your aggregation. In the following example, we aggregate all columns from the original dataset, but drop the "date" variable from the dataset (using mydf[setdiff(names(mydf), "date")]
).
# Aggregate all columns by the intervals created with cut.
# For the dataset, we drop the original date column since
# it is no longer needed here. Our function is "sum"
aggregate(. ~ cut(mydf$date, "5 min"),
mydf[setdiff(names(mydf), "date")],
sum)
# cut(mydf$date, "5 min") P_alex P_hvh
# 1 2011-06-27 22:00:00 12 12
# 2 2011-06-27 22:05:00 16 8
# 3 2011-06-27 22:10:00 12 5
# 4 2011-06-27 22:15:00 17 6
# 5 2011-06-27 22:20:00 10 8
# 6 2011-06-27 22:25:00 11 8
# 7 2011-06-27 22:30:00 12 7
# 8 2011-06-27 22:35:00 14 13
回答3:
One way is to map the dates to 5-minute blocks by using integer division (%/%
). The base will be the UNIX epoch if using POSIXct
datetimes. The you can sum on these blocks using aggregate
.
x <- data.frame(date=Sys.time()+60*0:10,value1=0:10,value2=rnorm(11))
aggregate(.~as.numeric(date)%/%(5*60),data=x,FUN=sum)
as.numeric(date)%/%(5 * 60) date value1 value2
1 4525797 1357739399 0 0.6209565
2 4525798 6788697893 15 -1.4342917
3 4525799 6788699393 40 0.8064627
回答4:
If you are familiar with SQL, you can easily create SQL statement to group data into 5-minutes intervals. For example in postgresql you can use something like:
select Now(), date_trunc('hour',Now()) + interval '1 minute' * trunc(date_part('minute',Now())/5)*5
I use sqldf package to do all such transformations.
来源:https://stackoverflow.com/questions/14236349/how-to-sum-5-minute-intervals-in-r