spark createOrReplaceTempView vs createGlobalTempView

匿名 (未验证) 提交于 2019-12-03 08:30:34

问题:

Spark Dataset 2.0 provides two functions createOrReplaceTempView and createGlobalTempView. I am not able to understand the basic difference between both functions.

According to API documents:

createOrReplaceTempView: The lifetime of this temporary view is tied to the [[SparkSession]] that was used to create this Dataset.
So, when I call sparkSession.close() the defined will be destroyed. is it true?

createGlobalTempView: The lifetime of this temporary view is tied to this Spark application.

when this type of view will be destroyed? any example. like sparkSession.close()?

回答1:

df.createOrReplaceTempView("tempViewName") df.createGlobalTempView("tempViewName") 

createOrReplaceTempView() creates or replaces a local temporary view with this dataframe df. Lifetime of this view is dependent to SparkSession class, is you want to drop this view :

spark.catalog.dropTempView("tempViewName") 

or stop() will shutdown the session

self.ss = SparkSession(sc) ... self.ss.stop() 

createGlobalTempView() creates a global temporary view with this dataframe df. life time of this view is dependent to spark application itself. If you want to drop :

spark.catalog.dropGlobalTempView("tempViewName") 

or stop() will shutdown

ss =  SparkContext(conf=conf, ......) ... ss.stop() 


回答2:

The Answer to your questions is basically understanding the difference of a Spark Application and a Spark Session.

Spark application can be used:

  • for a single batch job
  • an interactive session with multiple jobs
  • a long-lived server continually satisfying requests
  • A Spark job can consist of more than just a single map and reduce.
  • A Spark Application can consist of more than one session

A SparkSession on the other hand is associated to a Spark Application:

  • Generally, a session is an interaction between two or more entities.
  • in Spark 2.0 you can use SparkSession
  • A SparkSession can be created without creating SparkConf, SparkContext or SQLContext, (they’re encapsulated within the SparkSession)

Global temporary views are introduced in Spark 2.1.0 release. This feature is useful when you want to share data among different sessions and keep alive until your application ends.Please see a shot sample I wrote to illustrate the use for createTempView and createGlobalTempView

object NewSessionApp {    def main(args: Array[String]): Unit = {      val logFile = "data/README.md" // Should be some file on your system     val spark = SparkSession.       builder.       appName("Simple Application").       master("local").       getOrCreate()      val logData = spark.read.textFile(logFile).cache()     logData.createGlobalTempView("logdata")     spark.range(1).createTempView("foo")      // within the same session the foo table exists      println("""spark.catalog.tableExists("foo") = """ + spark.catalog.tableExists("foo"))     //spark.catalog.tableExists("foo") = true      // for a new session the foo table does not exists     val newSpark = spark.newSession     println("""newSpark.catalog.tableExists("foo") = """ + newSpark.catalog.tableExists("foo"))     //newSpark.catalog.tableExists("foo") = false      //both session can access the logdata table     spark.sql("SELECT * FROM global_temp.logdata").show()     newSpark.sql("SELECT * FROM global_temp.logdata").show()      spark.stop()   } } 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!