Subsetting a dataframe based on daily maxima

后端 未结 4 381
遥遥无期
遥遥无期 2021-01-27 05:15

I have an excel csv with a date/time column and a value associated with that date/time. I\'m trying to write a script that will go through this format (see below), and find 1) t

相关标签:
4条回答
  • 2021-01-27 05:36

    If you are dealing with time series data, I suggest you use a time series class like zoo or xts

    dat <- read.table(text="         V1    V2 V3
    1  5/1/2012  3:00  1
    2  5/1/2012  6:00  2
    3  5/1/2012  9:00  5
    4  5/1/2012 12:00  3
    5  5/1/2012 15:00  6
    6  5/1/2012 18:00  2
    7  5/1/2012 21:00  1
    8  5/2/2012  0:00  2
    9  5/2/2012  3:00  3
    10 5/2/2012  6:00  6
    11 5/2/2012  9:00  4
    12 5/2/2012 12:00  6
    13 5/2/2012 15:00  7
    14 5/2/2012 18:00  9
    15 5/2/2012 21:00  1", row.names=1, header=TRUE)
    
    require("xts")
    # create an xts object
    xobj <- xts(dat[, 3], order.by=as.POSIXct(paste(dat[, 1], dat[, 2]), format="%m/%d/%Y %H:%M"))
    

    If you just wanted to get the daily maximums, and you were okay with using the last time of the day as the index, you could use apply.daily

    apply.daily(xobj, max)
    #                    [,1]
    #2012-05-01 21:00:00    6
    #2012-05-02 21:00:00    9
    

    To keep the timestamps at which it occurs, you could do this

    do.call(rbind, lapply(split(xobj, "days"), function(x) x[which.max(x), ]))
    #                    [,1]
    2012-05-01 15:00:00    6
    2012-05-02 18:00:00    9
    

    split(xobj, "days") creates a list with one day's data in each element.

    lapply applies a function to each day; the function, in this case, simply returns the max observation for each day. The lapply call will return a list of xts objects. To turn it back into a single xts object, use do.call.

    do.call(rbind, X) constructs a call to rbind using each element of the list. It is equivalent to rbind(X[[1]], X[[2]], ..., X[[n]])

    0 讨论(0)
  • 2021-01-27 05:37

    here you go:

    dat.str <- '         V1    V2 V3
    1  5/1/2012  3:00  1
    2  5/1/2012  6:00  2
    3  5/1/2012  9:00  5
    4  5/1/2012 12:00  3
    5  5/1/2012 15:00  6
    6  5/1/2012 18:00  2
    7  5/1/2012 21:00  1
    8  5/2/2012  0:00  2
    9  5/2/2012  3:00  3
    10 5/2/2012  6:00  6
    11 5/2/2012  9:00  4
    12 5/2/2012 12:00  6
    13 5/2/2012 15:00  7
    14 5/2/2012 18:00  9
    15 5/2/2012 21:00  1'
    
    dat <- read.table(textConnection(dat.str), row.names=1, header=TRUE)
    
    do.call(rbind, 
            by(dat, INDICES=dat$V1, FUN=function(x) tail(x[order(x$V3), ], 1)))
    
    0 讨论(0)
  • 2021-01-27 05:49

    A solution using the plyr package, which I find very elegant for problems like this.

    dat.str <- '         V1    V2 V3
    1  5/1/2012  3:00  1
    2  5/1/2012  6:00  2
    3  5/1/2012  9:00  5
    4  5/1/2012 12:00  3
    5  5/1/2012 15:00  6
    6  5/1/2012 18:00  2
    7  5/1/2012 21:00  1
    8  5/2/2012  0:00  2
    9  5/2/2012  3:00  3
    10 5/2/2012  6:00  6
    11 5/2/2012  9:00  4
    12 5/2/2012 12:00  6
    13 5/2/2012 15:00  7
    14 5/2/2012 18:00  9
    15 5/2/2012 21:00  1'
    
    dat <- read.table(textConnection(dat.str), row.names=1, header=TRUE)
    
    library(plyr)
    ddply(dat, .(V1), function(x){
       x[which.max(x$V3), ]
    })
    
    0 讨论(0)
  • 2021-01-27 06:00

    For another alternative, you could use data.table:

    dat_table <- data.table(dat)
    
    dat_table [ , list(is_max = V3==max(V3), V2, V3), by= 'V1'][which(is_max),][,is_max :=NULL]
    

    EDIT as per @MattDowle's comment

    dat_table[, .SD[which.max(V3)], by=V1]
    

    For an even simpler data.table solution.

    0 讨论(0)
提交回复
热议问题