How to connect to Amazon Redshift or other DB's in Apache Spark?

后端 未结 6 2027
刺人心
刺人心 2021-01-13 09:44

I\'m trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capab

6条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-13 10:09

    Although this seems to be a very old post, anyone who is still looking for answer, below steps worked for me!

    Start the shell including the jar.

    bin/pyspark --driver-class-path /path_to_postgresql-42.1.4.jar --jars /path_to_postgresql-42.1.4.jar
    

    Create a df by giving appropriate details:

    myDF = spark.read \
        .format("jdbc") \
        .option("url", "jdbc:redshift://host:port/db_name") \
        .option("dbtable", "table_name") \
        .option("user", "user_name") \
        .option("password", "password") \
        .load()
    

    Spark Version: 2.2

提交回复
热议问题