PySpark: Create New Column And Fill In Based on Conditions of Two Other Columns

前端未结

关注

 1  1648

I have the following data frame:

+---+---+------+
| id| ts|days_r|
+---+---+------+
|123|  T|    32|
|342|  I|


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2021-01-03 11:03
              
            
            
                                                                       

Your code has a bug- you are missing a set of parentheses on the third line. Here is a way to fix your code, and use chained when() statements instead of using multiple otherwise() statements:

df = df.withColumn(
    '0to2_count',
    F.when((F.col("ts") == 'I') & (F.col("days_r") >=0) & (F.col("days_r") <= 2), 1)\
    .when((F.col("ts") == 'T') & (F.col('days_r') >=0) & (F.col('days_r') <= 48), 1)\
    .when((F.col("ts") == 'L') & (F.col('days_r') >=0) & (F.col('days_r') <= 7), 1)\
    .otherwise(0)
)


An even better way to write this logic is to use pyspark.sql.Column.between():

df = df.withColumn(
    '0to2_count',
    F.when((F.col("ts") == 'I') & F.col("days_r").between(0, 2), 1)\
    .when((F.col("ts") == 'T') & F.col('days_r').between(0,48), 1)\
    .when((F.col("ts") == 'L') & F.col('days_r').between(0,7), 1)\
    .otherwise(0)
)
df.show()
#+---+---+------+----------+
#| id| ts|days_r|0to2_count|
#+---+---+------+----------+
#|123|  T|    32|         1|
#|342|  I|     3|         0|
#|349|  L|    10|         0|
#+---+---+------+----------+


Of course since the first three conditions return the same value, you could further simplify this into one Boolean logic condition.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复