How to get table names from SQL query?

前端 未结 6 1505
太阳男子
太阳男子 2020-12-18 04:48

I want to get all the tables names from a sql query in Spark using Scala.

Lets say user sends a SQL query which looks like:

select * from table_1 as         


        
相关标签:
6条回答
  • 2020-12-18 04:49

    Hope it will help you

    Parse the given query using spark sql parser (spark internally does same). You can get sqlParser from session's state. It will give Logical plan of query. Iterate over logical plan of query & check whether it is instance of UnresolvedRelation (leaf logical operator to represent a table reference in a logical query plan that has yet to be resolved) & get table from it.

    def getTables(query: String) : Seq[String] ={
        val logical : LogicalPlan = localsparkSession.sessionState.sqlParser.parsePlan(query)
        val tables = scala.collection.mutable.LinkedHashSet.empty[String]
        var i = 0
        while (true) {
          if (logical(i) == null) {
            return tables.toSeq
          } else if (logical(i).isInstanceOf[UnresolvedRelation]) {
            val tableIdentifier = logical(i).asInstanceOf[UnresolvedRelation].tableIdentifier
            tables += tableIdentifier.unquotedString.toLowerCase
          }
          i = i + 1
        }
        tables.toSeq
    }
    
    0 讨论(0)
  • 2020-12-18 04:58

    I had some complicated sql queries with nested queries and iterated on @Jacek Laskowski's answer to get this

      def getTables(spark: SparkSession, query: String): Seq[String] = {
        val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
        var tables = new ListBuffer[String]()
        var i: Int = 0
    
        while (logicalPlan(i) != null) {
          logicalPlan(i) match {
            case t: UnresolvedRelation => tables += t.tableName
            case _ => 
          }
          i += 1
        }
    
        tables.toList
      }
    
    0 讨论(0)
  • 2020-12-18 04:58

    unix did the trick, grep 'INTO\|FROM\|JOIN' .sql | sed -r 's/.?(FROM|INTO|JOIN)\s?([^ ])./\2/g' | sort -u

    grep 'overwrite table' .txt | sed -r 's/.?(overwrite table)\s?([^ ])./\2/g' | sort -u

    0 讨论(0)
  • 2020-12-18 05:06
    def __sqlparse2table(self, query):
        '''
        @description: get table name from table
        '''
        plan = self.spark._jsparkSession.sessionState().sqlParser().parsePlan(query)
        plan_string = plan.toString().replace('`.`', '.')
        unr = re.findall(r"UnresolvedRelation `(.*?)`", plan_string)
        cte = re.findall(r"CTE \[(.*?)\]", plan.toString())
        cte = [tt.strip() for tt in cte[0].split(',')] if cte else cte
        schema = set()
        tables = set()
        for table_name in unr:
            if table_name not in cte:
                schema.update([table_name.split('.')[0]])
                tables.update([table_name])
    
        return schema, tables
    
    0 讨论(0)
  • 2020-12-18 05:10

    Thanks a lot @Swapnil Chougule for the answer. That inspired me to offer an idiomatic way of collecting all the tables in a structured query.

    scala> spark.version
    res0: String = 2.3.1
    
    def getTables(query: String): Seq[String] = {
      val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
      import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
      logicalPlan.collect { case r: UnresolvedRelation => r.tableName }
    }
    
    val query = "select * from table_1 as a left join table_2 as b on a.id=b.id"
    scala> getTables(query).foreach(println)
    table_1
    table_2
    
    0 讨论(0)
  • 2020-12-18 05:15

    Since you need to list all the columns names listed in table1 and table2, what you can do is to show tables in db.table_name in your hive db.

    val tbl_column1 = sqlContext.sql("show tables in table1");
    val tbl_column2 = sqlContext.sql("show tables in table2");
    

    You will get list of columns in both the table.

    tbl_column1.show

    name      
    id  
    data    
    
    0 讨论(0)
提交回复
热议问题