Create dummy variables in one table based on range of dates in another table

后端未结

关注

 1  1828

佛祖请我去吃肉 2021-01-15 03:08

I have two tables. table1 looks like this

  date       hour     data
2010-05-01     3        5
2010-05-02     7        7
2010-05-02     10


      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   终归单人心
                                             
                
                
                (楼主)
            
              
              
                2021-01-15 03:54
              

            
            
                        
Here's a way of achieving what you want. This assumes your table1's time precision is 1 hour. Though it can be modified to an arbitrary precision, it will perform much better for larger time intervals as it constructs the full sequence of possible times in the date_out-date_back range. Note, I used slightly different tables from OP to illustrate overlapping intervals and to correct some mistakes in OP.

table1 = data.table(date = c("2010-05-01", "2010-05-02", "2010-05-02", "2010-07-03", "2011-12-09", "2012-05-01"), hour = c(3,7,10,18,22,3), data = c(5,7,8,3,1,0))
outages = data.table(resource = c("joey", "bob", "billy", "bob", "joey"), date_out = c("2010-04-30 4:00:00", "2010-04-30 4:00:00", "2009-04-20 7:00:00", "2011-11-15 12:20:00", "2012-04-28 1:00:00"), date_back=c("2010-05-02 8:30:00", "2010-05-02 8:30:00", "2009-06-02 5:30:00", "2011-12-09 23:00:00", "2012-05-02 17:00:00"))

# round up date_out and round down date_back
# and create a sequence in-between spaced by 1 hour
outages[, list(datetime = seq(as.POSIXct(round(as.POSIXct(date_out) + 30*60-1, "hours")),
                              as.POSIXct(round(as.POSIXct(date_back) - 30*60, "hours")),
                              60*60)),
          by = list(resource, date_out)] -> outages.expanded
setkey(outages.expanded, datetime)

# merge with the original table, then run "table" to get the frequencies/occurences
# and cbind back with the original table
cbind(table1, unclass(table(
                outages.expanded[table1[, list(datetime=as.POSIXct(paste0(date, " ", hour, ":00:00")))],
                                 resource])))

#         date hour data bob joey
#1: 2010-05-01    3    5   1    1
#2: 2010-05-02    7    7   1    1
#3: 2010-05-02   10    8   0    0
#4: 2010-07-03   18    3   0    0
#5: 2011-12-09   22    1   1    0
#6: 2012-05-01    3    0   0    1

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复