How to connect to Amazon Redshift or other DB's in Apache Spark?

后端 未结 6 2029
刺人心
刺人心 2021-01-13 09:44

I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab

6条回答
  •  不思量自难忘°
    2021-01-13 10:17

    You first need to download Postgres JDBC driver. You can find it here: https://jdbc.postgresql.org/

    You can either define your environment variable SPARK_CLASSPATH in .bashrc, conf/spark-env.sh or similar file or specify it in the script before you run your IPython notebook.

    You can also define it in your conf/spark-defaults.conf in the following way:

    spark.driver.extraClassPath  /path/to/file/postgresql-9.4-1201.jdbc41.jar
    

    Make sure it is reflected in the Environment tab of your Spark WebUI.

    You will also need to set appropriate AWS credentials in the following way:

    sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "***")
    sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "***")
    

提交回复
热议问题