发表新帖

发表新帖

Use SparkSession.sql() with JDBC

前端未结

关注

 1  642

夕颜 2021-01-29 09:43

Problem :

I would like to use JDBC connection to make a custom request using spark.

The goal of this query is to optimized memory allocation on

1条回答

离开以前 (楼主)

2021-01-29 10:20
Spark can read and write data to/from relational databases using the JDBC data source (like you did in your first code example).

In addition (and completely separately), spark allows using SQL to query views that were created over data that was already loaded into a DataFrame from some source. For example:
```
val df = Seq(1,2,3).toDF("a") // could be any DF, loaded from file/JDBC/memory...
df.createOrReplaceTempView("my_spark_table")
spark.sql("select a from my_spark_table").show()
```
Only "tables" (called views, as of Spark 2.0.0) created this way can be queried using SparkSession.sql.

If your data is stored in a relational database, Spark will have to read it from there first, and only then would it be able to execute any distributed computation on the loaded copy. Bottom line - we can load the data from the table using read, create a temp view, and then query it:
```
ss.read
  .format("jdbc")
  .option("url", "jdbc:mysql://127.0.0.1/database_name")
  .option("dbtable", "schema.tablename")
  .option("user", "username")
  .option("password", "password")
  .load()
  .createOrReplaceTempView("my_spark_table")

// and then you can query the view:
val df = ss.sql("select * from my_spark_table where ... ")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题