how to select all columns that starts with a common label

后端未结

关注

 2  1421

I have a dataframe in Spark 1.6 and want to select just some columns out of it. The column names are like:

colA, colB, colC, colD, colE, colF-0, colF-1, colF-2
<


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  别那么骄傲        
                
              
                            
                2021-02-13 19:38
              
            
            
                                                                       
First grab the column names with df.columns, then filter down to just the column names you want .filter(_.startsWith("colF")). This gives you an array of Strings. But the select takes select(String, String*). Luckily select for columns is select(Column*), so finally convert the Strings into Columns with .map(df(_)), and finally turn the Array of Columns into a var arg with : _*.

df.select(df.columns.filter(_.startsWith("colF")).map(df(_)) : _*).show


This filter could be made more complex (same as Pandas). It is however a rather ugly solution (IMO):

df.select(df.columns.filter(x => (x.equals("colA") || x.startsWith("colF"))).map(df(_)) : _*).show 


If the list of other columns is fixed you could also merge a fixed array of columns names with filtered array.

df.select((Array("colA", "colB") ++ df.columns.filter(_.startsWith("colF"))).map(df(_)) : _*).show

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2021-02-13 19:38
              
            
            
                                                                       
I wrote a function that does that. Read the comments to see how it works.

  /**
    * Given a sequence of prefixes, select suitable columns from [[DataFrame]]
    * @param columnPrefixes Sequence of prefixes
    * @param dF Incoming [[DataFrame]]
    * @return [[DataFrame]] with prefixed columns selected
    */
  def selectPrefixedColumns(columnPrefixes: Seq[String], dF: DataFrame): DataFrame = {
    // Find out if given column name matches any of the provided prefixes
    def colNameStartsWith: String => Boolean = (colName: String) =>
        columnsPrefix.map(prefix => colName.startsWith(prefix)).reduce(_ || _)
    // Filter columns list by checking against given prefixes sequence
    val columns = dF.columns.filter(colNameStartsWith)
    // Select filtered columns list
    dF.select(columns.head, columns.tail:_*)
  }

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复