Upacking a list to select multiple columns from a spark data frame

后端未结

关注

 7  1110

I have a spark data frame df. Is there a way of sub selecting a few columns using a list of these columns?

scala> df.columns
res0: Array[Stri


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  悲哀的现实        
                
              
                            
                2020-12-07 14:25
              
            
            
                                                                       
you can do like this

String[] originCols = ds.columns();
ds.selectExpr(originCols)



  spark selectExp source code


     /**
   * Selects a set of SQL expressions. This is a variant of `select` that accepts
   * SQL expressions.
   *
   * {{{
   *   // The following are equivalent:
   *   ds.selectExpr("colA", "colB as newName", "abs(colC)")
   *   ds.select(expr("colA"), expr("colB as newName"), expr("abs(colC)"))
   * }}}
   *
   * @group untypedrel
   * @since 2.0.0
   */
  @scala.annotation.varargs
  def selectExpr(exprs: String*): DataFrame = {
    select(exprs.map { expr =>
      Column(sparkSession.sessionState.sqlParser.parseExpression(expr))
    }: _*)
  }

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2020-12-07 14:26
              
            
            
                                                                       
You can typecast String to spark column like this:

import org.apache.spark.sql.functions._
df.select(cols.map(col): _*)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长发绾君心        
                
              
                            
                2020-12-07 14:28
              
            
            
                                                                       
First convert the String Array to a List of Spark dataset Column type as below 

String[] strColNameArray = new String[]{"a", "b", "c", "d"};

List<Column> colNames = new ArrayList<>();

for(String strColName : strColNameArray){
    colNames.add(new Column(strColName));
}


then convert the List using JavaConversions functions within the select statement as below. You need the following import statement.

import scala.collection.JavaConversions;

Dataset<Row> selectedDF = df.select(JavaConversions.asScalaBuffer(colNames ));

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2020-12-07 14:29
              
            
            
                                                                       
Yes , You can make use of .select in scala.

Use .head and .tail to select the whole values mentioned in the List()

Example

val cols = List("b", "c")
df.select(cols.head,cols.tail: _*)


Explanation
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦毁少年i        
                
              
                            
                2020-12-07 14:30
              
            
            
                                                                       
Another option that I've just learnt.

import org.apache.spark.sql.functions.col
val columns = Seq[String]("col1", "col2", "col3")
val colNames = columns.map(name => col(name))
val df = df.select(colNames:_*)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2020-12-07 14:41
              
            
            
                                                                       
You can pass arguments of type Column* to select:

val df = spark.read.json("example.json")
val cols: List[String] = List("a", "b")
//convert string to Column
val col: List[Column] = cols.map(df(_))
df.select(col:_*)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复