I have an excel csv with a date/time column and a value associated with that date/time. I\'m trying to write a script that will go through this format (see below), and find 1) t
If you are dealing with time series data, I suggest you use a time series class like zoo
or xts
dat <- read.table(text=" V1 V2 V3
1 5/1/2012 3:00 1
2 5/1/2012 6:00 2
3 5/1/2012 9:00 5
4 5/1/2012 12:00 3
5 5/1/2012 15:00 6
6 5/1/2012 18:00 2
7 5/1/2012 21:00 1
8 5/2/2012 0:00 2
9 5/2/2012 3:00 3
10 5/2/2012 6:00 6
11 5/2/2012 9:00 4
12 5/2/2012 12:00 6
13 5/2/2012 15:00 7
14 5/2/2012 18:00 9
15 5/2/2012 21:00 1", row.names=1, header=TRUE)
require("xts")
# create an xts object
xobj <- xts(dat[, 3], order.by=as.POSIXct(paste(dat[, 1], dat[, 2]), format="%m/%d/%Y %H:%M"))
If you just wanted to get the daily maximums, and you were okay with using the last time of the day as the index, you could use apply.daily
apply.daily(xobj, max)
# [,1]
#2012-05-01 21:00:00 6
#2012-05-02 21:00:00 9
To keep the timestamps at which it occurs, you could do this
do.call(rbind, lapply(split(xobj, "days"), function(x) x[which.max(x), ]))
# [,1]
2012-05-01 15:00:00 6
2012-05-02 18:00:00 9
split(xobj, "days")
creates a list with one day's data in each element.
lapply
applies a function to each day; the function, in this case, simply returns the max
observation for each day. The lapply
call will return a list
of xts objects. To turn it back into
a single xts object, use do.call
.
do.call(rbind, X)
constructs a call to rbind using each element of the list. It is equivalent to rbind(X[[1]], X[[2]], ..., X[[n]])
here you go:
dat.str <- ' V1 V2 V3
1 5/1/2012 3:00 1
2 5/1/2012 6:00 2
3 5/1/2012 9:00 5
4 5/1/2012 12:00 3
5 5/1/2012 15:00 6
6 5/1/2012 18:00 2
7 5/1/2012 21:00 1
8 5/2/2012 0:00 2
9 5/2/2012 3:00 3
10 5/2/2012 6:00 6
11 5/2/2012 9:00 4
12 5/2/2012 12:00 6
13 5/2/2012 15:00 7
14 5/2/2012 18:00 9
15 5/2/2012 21:00 1'
dat <- read.table(textConnection(dat.str), row.names=1, header=TRUE)
do.call(rbind,
by(dat, INDICES=dat$V1, FUN=function(x) tail(x[order(x$V3), ], 1)))
A solution using the plyr package, which I find very elegant for problems like this.
dat.str <- ' V1 V2 V3
1 5/1/2012 3:00 1
2 5/1/2012 6:00 2
3 5/1/2012 9:00 5
4 5/1/2012 12:00 3
5 5/1/2012 15:00 6
6 5/1/2012 18:00 2
7 5/1/2012 21:00 1
8 5/2/2012 0:00 2
9 5/2/2012 3:00 3
10 5/2/2012 6:00 6
11 5/2/2012 9:00 4
12 5/2/2012 12:00 6
13 5/2/2012 15:00 7
14 5/2/2012 18:00 9
15 5/2/2012 21:00 1'
dat <- read.table(textConnection(dat.str), row.names=1, header=TRUE)
library(plyr)
ddply(dat, .(V1), function(x){
x[which.max(x$V3), ]
})
For another alternative, you could use data.table
:
dat_table <- data.table(dat)
dat_table [ , list(is_max = V3==max(V3), V2, V3), by= 'V1'][which(is_max),][,is_max :=NULL]
EDIT as per @MattDowle's comment
dat_table[, .SD[which.max(V3)], by=V1]
For an even simpler data.table
solution.