I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab
Although this seems to be a very old post, anyone who is still looking for answer, below steps worked for me!
Start the shell including the jar.
bin/pyspark --driver-class-path /path_to_postgresql-42.1.4.jar --jars /path_to_postgresql-42.1.4.jar
Create a df by giving appropriate details:
myDF = spark.read \
.format("jdbc") \
.option("url", "jdbc:redshift://host:port/db_name") \
.option("dbtable", "table_name") \
.option("user", "user_name") \
.option("password", "password") \
.load()
Spark Version: 2.2