How to connect to Amazon Redshift or other DB's in Apache Spark?

后端未结

关注

 6  2029

刺人心 2021-01-13 09:44

I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab

6条回答

不思量自难忘° (楼主)

2021-01-13 10:17
You first need to download Postgres JDBC driver. You can find it here: https://jdbc.postgresql.org/

You can either define your environment variable SPARK_CLASSPATH in .bashrc, conf/spark-env.sh or similar file or specify it in the script before you run your IPython notebook.

You can also define it in your conf/spark-defaults.conf in the following way:
```
spark.driver.extraClassPath  /path/to/file/postgresql-9.4-1201.jdbc41.jar
```
Make sure it is reflected in the Environment tab of your Spark WebUI.

You will also need to set appropriate AWS credentials in the following way:
```
sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "***")
sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "***")
```
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...