Create a new dataset based given operation column

前端未结

关注

 2  1384

I am using spark-sql-2.3.1v and have the below scenario:

Given a dataset:

val ds = Seq(
  (1, \"x1\", \"y1\", \"0.1992019\"),
  (2, null, \"y2\", \"2


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  别那么骄傲        
                
              
                            
                2020-12-02 03:40
              
            
            
                                                                       
You can also use filter to filter out null values.


scala> val operationCol = "col_x" // for one column
operationCol: String = col_x

scala> ds.filter(col(operationCol).isNotNull).show(false)
+---+-----+-----+---------+
|id |col_x|col_y|value    |
+---+-----+-----+---------+
|1  |x1   |y1   |0.1992019|
|3  |x3   |null |15.34567 |
|5  |x4   |y4   |0        |
+---+-----+-----+---------+


scala> val operationCol = Seq("col_x","col_y") // For multiple Columns
operationCol: Seq[String] = List(col_x, col_y)

scala> ds.filter(operationCol.map(col(_).isNotNull).reduce(_ && _)).show
+---+-----+-----+---------+
| id|col_x|col_y|    value|
+---+-----+-----+---------+
|  1|   x1|   y1|0.1992019|
|  5|   x4|   y4|        0|
+---+-----+-----+---------+


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野的像风        
                
              
                            
                2020-12-02 03:44
              
            
            
                                                                       
You can use df.na.drop() to drop rows that contains nulls. The drop function can take a list of the columns you want to consider as input, so in this case, you can write it as follows:

val newDf = df.na.drop(Seq(operationCol))


This will create a new dataframe newDf with where all rows in operationCol have been removed.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复