data.table cumulative stats of irregular observations with time window

前端未结

关注

 2  1474

I have some transactional records, like the following:

library(data.table)
customers      <- 1:75
purchase_dates <- seq( as.Date(\'2016-01-01\'),


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  野的像风        
                
              
                            
                2021-01-22 19:04
              
            
            
                                                                       

  I would like to know the prior transaction count and total amount, within a 365-day prior window (i.e., at d-365 through d-1 for a transaction on date d).


I think the idiomatic way is:

df[, c("ppn", "ppa") := 
  df[.(cust_id = cust_id, d_dn = purch_dt-365, d_up = purch_dt), 
    on=.(cust_id, purch_dt >= d_dn, purch_dt < d_up), 
    .(.N, sum(purch_amt, na.rm=TRUE))
  , by=.EACHI][, .(N, V2)]
]

     cust_id   purch_dt purch_amt ppn    ppa
  1:       1 2016-03-20     69.65   0   0.00
  2:       1 2016-05-17    413.60   1  69.65
  3:       1 2016-12-25    357.18   2 483.25
  4:       1 2017-03-20    256.21   3 840.43
  5:       2 2016-05-26     49.14   0   0.00
 ---                                        
494:      75 2018-01-12    381.24   2 201.04
495:      75 2018-04-01     65.83   3 582.28
496:      75 2018-06-17    170.30   4 648.11
497:      75 2018-07-22     60.49   5 818.41
498:      75 2018-10-10     66.12   4 677.86


This is a "non-equi join".
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2021-01-22 19:10
              
            
            
                                                                       
Here's the Cartesian self-join with date-range filter:

df_prior <- df[df, on=.(cust_id), allow.cartesian=TRUE
                ][i.purch_dt < purch_dt & 
                    i.purch_dt >= purch_dt - 365
                  ][, .(prior_purch_cnt = .N, 
                        prior_purch_amt = sum(i.purch_amt)),
                     keyby=.(cust_id, purch_dt)]

df2 <- df_prior[df, on=.(cust_id, purch_dt)]

df2[is.na(prior_purch_cnt), `:=`(prior_purch_cnt=0,
                                 prior_purch_amt=0
                                 )]
df2
# cust_id   purch_dt prior_purch_cnt prior_purch_amt purch_amt
#       1 2016-03-20               0            0.00     69.65
#       1 2016-05-17               1           69.65    413.60
#       1 2016-12-25               2          483.25    357.18
#       1 2017-03-20               3          840.43    256.21
#       2 2016-05-26               0            0.00     49.14


I'm concerned about how this could blow up prior to filtering on datasets where customers have many prior transactions.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复