Executing multiple SQL queries on Spark

前端 未结 1 1355
北海茫月
北海茫月 2021-01-16 22:30

I have a Spark SQL query in a file test.sql -

CREATE GLOBAL TEMPORARY VIEW VIEW_1 AS select a,b from abc

CREATE GLO         


        
相关标签:
1条回答
  • 2021-01-16 23:07

    The problem is that mkString concatenates all the lines in a single string, which cannot be properly parsed as a valid SQL query.

    Each line from the script file should be executed as a separate query, for example:

    scala.io.Source.fromFile("test.sql").getLines()
      .filterNot(_.isEmpty)  // filter out empty lines
      .foreach(query =>
        spark.sql(query).show
      )
    

    Update

    If queries are split on more than one line, the case is a bit more complex.

    We absolutely need to have a token that marks the end of a query. Let it be the semi-colon character, as in standard SQL.

    First, we collect all non-empty lines from the source file:

    val lines = scala.io.Source.fromFile(sqlFile).getLines().filterNot(_.isEmpty)
    

    Then we process the collected lines, concatenating each new line with the previous one, if it does not end with a semicolon:

    val queries = lines.foldLeft(List[String]()) { case(queries, line) =>
      queries match {
        case Nil => List(line) // case for the very first line
        case init :+ last =>
          if (last.endsWith(";")) {
            // if a query ended on a previous line, we simply append the new line to the list of queries
            queries :+ line.trim
          } else {
            // the query is not terminated yet, concatenate the line with the previous one
            val queryWithNextLine = last + " " + line.trim
            init :+ queryWithNextLine
          }
      }
    }
    
    0 讨论(0)
提交回复
热议问题