Spark Unable to find JDBC Driver

前端未结

关注

 10  2140

So I\'ve been using sbt with assembly to package all my dependencies into a single jar for my spark jobs. I\'ve got several jobs where I was using c3p0 to setu

相关标签:

10条回答

陌清茗

2020-11-28 09:07

These options are clearly mentioned in spark docs: --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar

The mistake I was doing was mentioning these options after my application's jar.

However the correct way is to specify these options immediately after spark-submit:

spark-submit --driver-class-path /somepath/project/mysql-connector-java-5.1.30-bin.jar --jars /somepath/project/mysql-connector-java-5.1.30-bin.jar --class com.package.MyClass target/scala-2.11/project_2.11-1.0.jar

0 讨论(0)
发布评论:

提交评论
- 加载中...
有刺的猬

2020-11-28 09:08
Both spark driver and executor need mysql driver on class path so specify
```
spark.driver.extraClassPath = <path>/mysql-connector-java-5.1.36.jar
spark.executor.extraClassPath = <path>/mysql-connector-java-5.1.36.jar
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2020-11-28 09:09
I add the jar file to the SPARK_CLASSPATH in spark-env.sh, it works.
```
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/local/spark-1.6.3-bin-hadoop2.6/lib/mysql-connector-java-5.1.40-bin.jar
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

离开以前

2020-11-28 09:10

There exists a simple Java trick to solve your problem. You should specify Class.forName() instance. For example:

 val customers: RDD[(Int, String)] = new JdbcRDD(sc, () => {
       Class.forName("com.mysql.jdbc.Driver")
       DriverManager.getConnection(jdbcUrl)
      },
      "SELECT id, name from customer WHERE ? < id and id <= ?" ,
      0, range, partitions, r => (r.getInt(1), r.getString(2)))

Check the docs

0 讨论(0)

北荒

2020-11-28 09:13
spark.driver.extraClassPath does not work in client-mode:

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-class-path command line option or in your default properties file.

Env variable SPARK_CLASSPATH has been deprecated in Spark 1.0+.

You should first copy the jdbc driver jars into each executor under the same local filesystem path and then use the following options in you spark-submit:
```
--driver-class-path "driver_local_file_system_jdbc_driver1.jar:driver_local_file_system_jdbc_driver2.jar"
--class "spark.executor.extraClassPath=executors_local_file_system_jdbc_driver1.jar:executors_local_file_system_jdbc_driver2.jar"
```
For example in case of TeraData you need both terajdbc4.jar and tdgssconfig.jar .

Alternatively modify compute_classpath.sh on all worker nodes, Spark documentation says:

The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java’s DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs.
0 讨论(0)
发布评论:

提交评论
- 加载中...
有刺的猬

2020-11-28 09:14
With spark 2.2.0, problem was corrected for me by adding extra class path information for SparkSession session in python script :
```
    spark = SparkSession \
        .builder \
        .appName("Python Spark SQL basic example") \
        .config("spark.driver.extraClassPath", "/path/to/jdbc/driver/postgresql-42.1.4.jar") \
        .getOrCreate()
```
See official documentation https://spark.apache.org/docs/latest/configuration.html

In my case, spark is not launched from cli command, but from django framework https://www.djangoproject.com/
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页