How to connect to Amazon Redshift or other DB's in Apache Spark?

后端 未结 6 2031
刺人心
刺人心 2021-01-13 09:44

I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab

6条回答
  •  无人共我
    2021-01-13 10:19

    If you're using Spark 1.4.0 or newer, check out spark-redshift, a library which supports loading data from Redshift into Spark SQL DataFrames and saving DataFrames back to Redshift. If you're querying large volumes of data, this approach should perform better than JDBC because it will be able to unload and query the data in parallel.

    If you still want to use JDBC, check out the new built-in JDBC data source in Spark 1.4+.

    Disclosure: I'm one of the authors of spark-redshift.

提交回复
热议问题