How to get table names from SQL query?

前端未结

关注

 6  1505

I want to get all the tables names from a sql query in Spark using Scala.

Lets say user sends a SQL query which looks like:

select * from table_1 as


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执念已碎        
                
              
                            
                2020-12-18 04:49
              
            
            
                                                                       
Hope it will help you 

Parse the given query using spark sql parser (spark internally does same). You can get sqlParser from session's state. It will give Logical plan of query. Iterate over logical plan of query & check whether it is instance of UnresolvedRelation (leaf logical operator to represent a table reference in a logical query plan that has yet to be resolved) & get table from it.

def getTables(query: String) : Seq[String] ={
    val logical : LogicalPlan = localsparkSession.sessionState.sqlParser.parsePlan(query)
    val tables = scala.collection.mutable.LinkedHashSet.empty[String]
    var i = 0
    while (true) {
      if (logical(i) == null) {
        return tables.toSeq
      } else if (logical(i).isInstanceOf[UnresolvedRelation]) {
        val tableIdentifier = logical(i).asInstanceOf[UnresolvedRelation].tableIdentifier
        tables += tableIdentifier.unquotedString.toLowerCase
      }
      i = i + 1
    }
    tables.toSeq
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情书的邮戳        
                
              
                            
                2020-12-18 04:58
              
            
            
                                                                       
I had some complicated sql queries with nested queries and iterated on @Jacek Laskowski's answer to get this

  def getTables(spark: SparkSession, query: String): Seq[String] = {
    val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
    var tables = new ListBuffer[String]()
    var i: Int = 0

    while (logicalPlan(i) != null) {
      logicalPlan(i) match {
        case t: UnresolvedRelation => tables += t.tableName
        case _ => 
      }
      i += 1
    }

    tables.toList
  }

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  面向向阳花        
                
              
                            
                2020-12-18 04:58
              
            
            
                                                                       
unix did the trick, grep 'INTO\|FROM\|JOIN' .sql | sed -r 's/.?(FROM|INTO|JOIN)\s?([^ ])./\2/g' | sort -u

grep 'overwrite table' .txt | sed -r 's/.?(overwrite table)\s?([^ ])./\2/g' | sort -u
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2020-12-18 05:06
              
            
            
                                                                       
def __sqlparse2table(self, query):
    '''
    @description: get table name from table
    '''
    plan = self.spark._jsparkSession.sessionState().sqlParser().parsePlan(query)
    plan_string = plan.toString().replace('`.`', '.')
    unr = re.findall(r"UnresolvedRelation `(.*?)`", plan_string)
    cte = re.findall(r"CTE \[(.*?)\]", plan.toString())
    cte = [tt.strip() for tt in cte[0].split(',')] if cte else cte
    schema = set()
    tables = set()
    for table_name in unr:
        if table_name not in cte:
            schema.update([table_name.split('.')[0]])
            tables.update([table_name])

    return schema, tables

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  栀梦        
                
              
                            
                2020-12-18 05:10
              
            
            
                                                                       
Thanks a lot @Swapnil Chougule for the answer. That inspired me to offer an idiomatic way of collecting all the tables in a structured query.

scala> spark.version
res0: String = 2.3.1

def getTables(query: String): Seq[String] = {
  val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
  import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
  logicalPlan.collect { case r: UnresolvedRelation => r.tableName }
}

val query = "select * from table_1 as a left join table_2 as b on a.id=b.id"
scala> getTables(query).foreach(println)
table_1
table_2

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘了有多久        
                
              
                            
                2020-12-18 05:15
              
            
            
                                                                       
Since you need to list all the columns names listed in table1 and table2, what you can do is to show tables in db.table_name in your hive db.

val tbl_column1 = sqlContext.sql("show tables in table1");
val tbl_column2 = sqlContext.sql("show tables in table2");


You will get list of columns in both the table.

tbl_column1.show

name      
id  
data    

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复