Spark - Adding JDBC Driver JAR to Google Dataproc

最后都变了- 提交于 2019-12-30 19:57:09

问题


I am trying to write via JDBC:

df.write.jdbc("jdbc:postgresql://123.123.123.123:5432/myDatabase", "myTable", props)

The Spark docs explain that the configuration option spark.driver.extraClassPath cannot be used to add JDBC Driver JARs if running in client mode (which is the mode Dataproc runs in) since the JVM has already been started.

I tried adding the JAR path in Dataproc's submit command:

gcloud beta dataproc jobs submit spark ... 
     --jars file:///home/bryan/org.postgresql.postgresql-9.4-1203-jdbc41.jar

I also added the command to load the driver:

  Class.forName("org.postgresql.Driver")

But I still get the error:

java.sql.SQLException: No suitable driver found for jdbc:postgresql://123.123.123.123:5432/myDatabase 

回答1:


From my experience adding driver to the properties usually solves the problem:

props.put("driver", "org.postgresql.Driver")
db.write.jdbc(url, table, props)



回答2:


You may want to try adding --driver-class-path to the very end of your command arguments:

gcloud beta dataproc jobs submit spark ... 
    --jars file:///home/bryan/org.postgresql.postgresql-9.4-1203-jdbc41.jar \
    --driver-class-path /home/bryan/org.postgresql.postgresql-9.4-1203-jdbc41.jar

Another approach if you're staging the jarfile onto the cluster before the job anyway is to dump the jarfile you need into /usr/lib/hadoop/lib/ where it should automatically be part of the driver classpath for both Hadoop and Spark jobs.




回答3:


You can add jar (from --jars argument) to Spark Driver class-path using --properties argument when submitting Spark job through Dataproc:

$ gcloud dataproc jobs submit spark ... \
    --jars=gs://<BUCKET>/<DIRECTORIES>/<JAR_NAME> \
    --properties=spark.driver.extraClassPath=<JAR_NAME>


来源:https://stackoverflow.com/questions/32958311/spark-adding-jdbc-driver-jar-to-google-dataproc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!