How to use lag and rangeBetween functions on timestamp values?

前端未结

关注

 2  1469

鱼传尺愫 2021-02-06 09:09

I have data that looks like this:

userid,eventtime,location_point
4e191908,2017-06-04 03:00:00,18685891
4e191908,2017-06


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   遥遥无期
                                             
                
                
                (楼主)
            
              
              
                2021-02-06 09:32
              

            
            
                        
rangeBetween just doesn't make sense for non-aggregate function like lag. lag takes always a specific row, denoted by offset argument, so specifying frame is pointless.

To get a window over time series you can use window grouping with standard aggregates:

from pyspark.sql.functions import window,  countDistinct


(df
    .groupBy("location_point", window("eventtime", "5 minutes"))
    .agg( countDistinct("userid")))


You can add more arguments to modify slide duration.

You can try something similar with window functions if you partition by location:

windowSpec = (W.partitionBy(col("location"))
  .orderBy(col("eventtime").cast("timestamp").cast("long"))
  .rangeBetween(0, days(5)))


df.withColumn("id_count", countDistinct("userid").over(windowSpec))

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复