I have some data in CSV like:
\"Timestamp\", \"Count\"
\"2009-07-20 16:30:45\", 10
\"2009-07-20 16:30:45\", 15
\"2009-07-20 16:30:46\", 8
\"2009-07-20 16:30:46\"
Read your data, and convert it into a zoo object:
R> X <- read.csv("/tmp/so.csv")
R> X <- zoo(X$Count, order.by=as.POSIXct(as.character(X[,1])))
Note that this will show warnings because of non-unique timestamps.
Task 1 using aggregate
with length
to count:
R> aggregate(X, force, length)
2009-07-20 16:30:45 2009-07-20 16:30:46 2009-07-20 16:30:47
2 3 1
Task 2 using aggregate
:
R> aggregate(X, force, mean)
2009-07-20 16:30:45 2009-07-20 16:30:46 2009-07-20 16:30:47
12.500 7.333 20.000
Task 3 can be done the same way by aggregating up to higher-order indices. You can call plot
on the result from aggregate:
plot(aggregate(X, force, mean))
Averaging the data is easy with the plyr package.
library(plyr)
Second <- ddply(dataset, "Timestamp", function(x){
c(Average = mean(x$Count), N = nrow(x))
})
To do the same thing by minute or hour, then you need to add fields with that info.
library(chron)
dataset$Minute <- minutes(dataset$Timestamp)
dataset$Hour <- hours(dataset$Timestamp)
dataset$Day <- dates(dataset$Timestamp)
#aggregate by hour
Hour <- ddply(dataset, c("Day", "Hour"), function(x){
c(Average = mean(x$Count), N = nrow(x))
})
#aggregate by minute
Minute <- ddply(dataset, c("Day", "Hour", "Minute"), function(x){
c(Average = mean(x$Count), N = nrow(x))
})