How to connect to Amazon Redshift or other DB's in Apache Spark?

后端未结

关注

 6  2031

刺人心 2021-01-13 09:44

I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab

6条回答

无人共我 (楼主)

2021-01-13 10:19

If you're using Spark 1.4.0 or newer, check out spark-redshift, a library which supports loading data from Redshift into Spark SQL DataFrames and saving DataFrames back to Redshift. If you're querying large volumes of data, this approach should perform better than JDBC because it will be able to unload and query the data in parallel.

If you still want to use JDBC, check out the new built-in JDBC data source in Spark 1.4+.

Disclosure: I'm one of the authors of spark-redshift.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...