How to connect to Amazon Redshift or other DB's in Apache Spark?

后端未结

关注

 6  2027

刺人心 2021-01-13 09:44

I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab

6条回答

傲寒 (楼主)

2021-01-13 10:09

Although this seems to be a very old post, anyone who is still looking for answer, below steps worked for me!

Start the shell including the jar.

bin/pyspark --driver-class-path /path_to_postgresql-42.1.4.jar --jars /path_to_postgresql-42.1.4.jar

Create a df by giving appropriate details:

myDF = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:redshift://host:port/db_name") \
    .option("dbtable", "table_name") \
    .option("user", "user_name") \
    .option("password", "password") \
    .load()

Spark Version: 2.2

0 讨论(0)

查看其它6个回答