Create dummy variables in one table based on range of dates in another table

后端 未结 1 1828
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-15 03:08

I have two tables. table1 looks like this

  date       hour     data
2010-05-01     3        5
2010-05-02     7        7
2010-05-02     10              


        
1条回答
  •  终归单人心
    2021-01-15 03:54

    Here's a way of achieving what you want. This assumes your table1's time precision is 1 hour. Though it can be modified to an arbitrary precision, it will perform much better for larger time intervals as it constructs the full sequence of possible times in the date_out-date_back range. Note, I used slightly different tables from OP to illustrate overlapping intervals and to correct some mistakes in OP.

    table1 = data.table(date = c("2010-05-01", "2010-05-02", "2010-05-02", "2010-07-03", "2011-12-09", "2012-05-01"), hour = c(3,7,10,18,22,3), data = c(5,7,8,3,1,0))
    outages = data.table(resource = c("joey", "bob", "billy", "bob", "joey"), date_out = c("2010-04-30 4:00:00", "2010-04-30 4:00:00", "2009-04-20 7:00:00", "2011-11-15 12:20:00", "2012-04-28 1:00:00"), date_back=c("2010-05-02 8:30:00", "2010-05-02 8:30:00", "2009-06-02 5:30:00", "2011-12-09 23:00:00", "2012-05-02 17:00:00"))
    
    # round up date_out and round down date_back
    # and create a sequence in-between spaced by 1 hour
    outages[, list(datetime = seq(as.POSIXct(round(as.POSIXct(date_out) + 30*60-1, "hours")),
                                  as.POSIXct(round(as.POSIXct(date_back) - 30*60, "hours")),
                                  60*60)),
              by = list(resource, date_out)] -> outages.expanded
    setkey(outages.expanded, datetime)
    
    # merge with the original table, then run "table" to get the frequencies/occurences
    # and cbind back with the original table
    cbind(table1, unclass(table(
                    outages.expanded[table1[, list(datetime=as.POSIXct(paste0(date, " ", hour, ":00:00")))],
                                     resource])))
    
    #         date hour data bob joey
    #1: 2010-05-01    3    5   1    1
    #2: 2010-05-02    7    7   1    1
    #3: 2010-05-02   10    8   0    0
    #4: 2010-07-03   18    3   0    0
    #5: 2011-12-09   22    1   1    0
    #6: 2012-05-01    3    0   0    1
    

    0 讨论(0)
提交回复
热议问题