Efficient way to Fill Time-Series per group

后端 未结 2 1221
天涯浪人
天涯浪人 2020-12-11 11:33

I was looking for a way to fill a time series data set by time, per group. The very very inefficient way I was using was to split the data set per group and app

相关标签:
2条回答
  • 2020-12-11 12:15

    This can be done using zoo as well. This is an order of magnitude faster than the code and data in the question but not as fast as the data.table solution although there exists the possibility of speeding it iup further if the last line of code shown below is not needed.

    We read d1 into a zoo object z splitting it to give a multivariate time series having a column for each source. We then merge that with a zero width series having all the times and fortify that back to a data frame using the melt=TRUE argument to get a long form data.frame. If a wide form multivariate zoo series can be used then you could skip the last line in which case it would then be even faster.

    library(zoo)
    
    z <- read.zoo(d1, split = 1, index = 2) # wide form
    zz <- merge(z, zoo(, seq(start(z), end(z), "hour"))) # expand
    fortify(zz, melt = TRUE) # convert to long form data.frame
    
    0 讨论(0)
  • 2020-12-11 12:24

    It appears that data.table is really much faster than the tidyverse option. So merely translating the above into data.table(compliments of @Frank) completed the operation in little under 3 minutes.

    library(data.table)
    
    mDT = setDT(d1)[, .(grp = seq(min(grp), max(grp), by = "hour")), by = source]
    new_D <- d1[mDT, on = names(mDT)]
    
    new_D <- new_D[, cnt := replace(cnt, is.na(cnt), 0)] #If needed
    
    0 讨论(0)
提交回复
热议问题