发表新帖

发表新帖

How does createOrReplaceTempView work in Spark?

前端未结

关注

 3  718

I am new to Spark and Spark SQL.

How does createOrReplaceTempView work in Spark?

If we register an RDD of objects as a table will

相关标签:

3条回答

小鲜肉

2020-12-02 14:43
CreateOrReplaceTempView will create a temporary view of the table on memory it is not presistant at this moment but you can run sql query on top of that . if you want to save it you can either persist or use saveAsTable to save.

first we read data in csv format and then convert to data frame and create a temp view

Reading data in csv format
```
val data = spark.read.format("csv").option("header","true").option("inferSchema","true").load("FileStore/tables/pzufk5ib1500654887654/campaign.csv")
```
printing the schema

data.printSchema
```
data.createOrReplaceTempView("Data")
```
Now we can run sql queries on top the table view we just created
```
  %sql select Week as Date,Campaign Type,Engagements,Country from Data order     by Date asc
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
心在旅途

2020-12-02 14:43

SparkSQl support writing programs using Dataset and Dataframe API, along with it need to support sql.

In order to support Sql on DataFrames, first it requires a table definition with column names are required, along with if it creates tables the hive metastore will get lot unnecessary tables, because Spark-Sql natively resides on hive. So it will create a temporary view, which temporarily available in hive for time being and used as any other hive table, once the Spark Context stop it will be removed.

In order to create the view, developer need an utility called createOrReplaceTempView

0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-12-02 14:50
createOrReplaceTempView creates (or replaces if that view name already exists) a lazily evaluated "view" that you can then use like a hive table in Spark SQL. It does not persist to memory unless you cache the dataset that underpins the view.
```
scala> val s = Seq(1,2,3).toDF("num")
s: org.apache.spark.sql.DataFrame = [num: int]

scala> s.createOrReplaceTempView("nums")

scala> spark.table("nums")
res22: org.apache.spark.sql.DataFrame = [num: int]

scala> spark.table("nums").cache
res23: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [num: int]

scala> spark.table("nums").count
res24: Long = 3
```
The data is cached fully only after the .count call. Here's proof it's been cached:

Related SO: spark createOrReplaceTempView vs createGlobalTempView

Relevant quote (comparing to persistent table): "Unlike the createOrReplaceTempView command, saveAsTable will materialize the contents of the DataFrame and create a pointer to the data in the Hive metastore." from https://spark.apache.org/docs/latest/sql-programming-guide.html#saving-to-persistent-tables

Note : createOrReplaceTempView was formerly registerTempTable
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题