how to remove partial duplicates from a data frame?

后端 未结 2 841
半阙折子戏
半阙折子戏 2020-12-14 04:25

Data I\'m importing describes numeric measurements taken at various locations for more or less evenly spread timestamps. sometimes this \"evenly spread\" is not really true

相关标签:
2条回答
  • 2020-12-14 05:03

    I would use subset combined with duplicated to filter non-unique timestamps in the second data frame:

    R> df_ <- read.table(textConnection('
                         ts         v
    1 "2009-09-30 10:00:00" -2.081609
    2 "2009-09-30 10:15:00" -2.079778
    3 "2009-09-30 10:15:00" -2.113531
    4 "2009-09-30 10:15:00" -2.124716
    5 "2009-09-30 10:15:00" -2.102117
    6 "2009-09-30 10:30:00" -2.093542
    7 "2009-09-30 10:30:00" -2.092626
    8 "2009-09-30 10:45:00" -2.086339
    9 "2009-09-30 11:00:00" -2.080144
    '), as.is=TRUE, header=TRUE)
    
    R> subset(df_, !duplicated(ts))
                       ts      v
    1 2009-09-30 10:00:00 -2.082
    2 2009-09-30 10:15:00 -2.080
    6 2009-09-30 10:30:00 -2.094
    8 2009-09-30 10:45:00 -2.086
    9 2009-09-30 11:00:00 -2.080
    

    Update: To select a specific value you can use aggregate

    aggregate(df_$v, by=list(df_$ts), function(x) x[1])  # first value
    aggregate(df_$v, by=list(df_$ts), function(x) tail(x, n=1))  # last value
    aggregate(df_$v, by=list(df_$ts), function(x) max(x))  # max value
    
    0 讨论(0)
  • 2020-12-14 05:04

    I think you are looking at data structures for time-indexed objects, and not for a dictionary. For the former, look at the zoo and xts packages which offer much better time-pased subsetting:

    R> library(xts)
    R> X <- xts(data.frame(val=rnorm(10)), \
                order.by=Sys.time() + sort(runif(10,10,300)))
    R> X
                            val
    2009-11-20 07:06:17 -1.5564
    2009-11-20 07:06:40 -0.2960
    2009-11-20 07:07:50 -0.4123
    2009-11-20 07:08:18 -1.5574
    2009-11-20 07:08:45 -1.8846
    2009-11-20 07:09:47  0.4550
    2009-11-20 07:09:57  0.9598
    2009-11-20 07:10:11  1.0018
    2009-11-20 07:10:12  1.0747
    2009-11-20 07:10:58  0.7062
    R> X["2009-11-20 07:08::2009-11-20 07:09"]
                            val
    2009-11-20 07:08:18 -1.5574
    2009-11-20 07:08:45 -1.8846
    2009-11-20 07:09:47  0.4550
    2009-11-20 07:09:57  0.9598
    R> 
    

    The X object is ordered by a time sequence -- make sure it is of type POSIXct so you may need to parse your dates first. Then we can just index for '7:08 to 7:09 on the give day'.

    0 讨论(0)
提交回复
热议问题