What's the difference between SparkSession.sql and Dataset.sqlContext.sql?

后端 未结 1 2015
不思量自难忘°
不思量自难忘° 2021-01-06 07:20

I have the following snippets of the code and I wonder what is the difference between these two and which one should I use? I am using spark 2.2.

Dataset<         


        
1条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-01-06 08:17

    There is a very subtle difference between sparkSession.sql("sql query") vs df.sqlContext().sql("sql query").

    Please note that you can have zero, two or more SparkSessions in a single Spark application (but it's assumed you'll have at least and often only one SparkSession in a Spark SQL application).

    Please also note that a Dataset is bound to the SparkSession it was created within and the SparkSession will never change.

    You may be wondering why anyone would want it, but that gives you boundary between queries and you could use the same table names for different datasets and that is a very powerful feature of Spark SQL actually.

    The following example shows the difference and hopefully will give you some idea why it's powerful after all.

    scala> spark.version
    res0: String = 2.3.0-SNAPSHOT
    
    scala> :type spark
    org.apache.spark.sql.SparkSession
    
    scala> spark.sql("show tables").show
    +--------+---------+-----------+
    |database|tableName|isTemporary|
    +--------+---------+-----------+
    +--------+---------+-----------+
    
    scala> val df = spark.range(5)
    df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
    
    scala> df.sqlContext.sql("show tables").show
    +--------+---------+-----------+
    |database|tableName|isTemporary|
    +--------+---------+-----------+
    +--------+---------+-----------+
    
    scala> val anotherSession = spark.newSession
    anotherSession: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@195c5803
    
    scala> anotherSession.range(10).createOrReplaceTempView("new_table")
    
    scala> anotherSession.sql("show tables").show
    +--------+---------+-----------+
    |database|tableName|isTemporary|
    +--------+---------+-----------+
    |        |new_table|       true|
    +--------+---------+-----------+
    
    
    scala> df.sqlContext.sql("show tables").show
    +--------+---------+-----------+
    |database|tableName|isTemporary|
    +--------+---------+-----------+
    +--------+---------+-----------+
    

    0 讨论(0)
提交回复
热议问题