Spark Unable to find JDBC Driver

前端 未结 10 2140
栀梦
栀梦 2020-11-28 08:47

So I\'ve been using sbt with assembly to package all my dependencies into a single jar for my spark jobs. I\'ve got several jobs where I was using c3p0 to setu

相关标签:
10条回答
  • 2020-11-28 09:07

    These options are clearly mentioned in spark docs: --driver-class-path postgresql-9.4.1207.jar --jars postgresql-9.4.1207.jar

    The mistake I was doing was mentioning these options after my application's jar.

    However the correct way is to specify these options immediately after spark-submit:

    spark-submit --driver-class-path /somepath/project/mysql-connector-java-5.1.30-bin.jar --jars /somepath/project/mysql-connector-java-5.1.30-bin.jar --class com.package.MyClass target/scala-2.11/project_2.11-1.0.jar

    0 讨论(0)
  • 2020-11-28 09:08

    Both spark driver and executor need mysql driver on class path so specify

    spark.driver.extraClassPath = <path>/mysql-connector-java-5.1.36.jar
    spark.executor.extraClassPath = <path>/mysql-connector-java-5.1.36.jar
    
    0 讨论(0)
  • 2020-11-28 09:09

    I add the jar file to the SPARK_CLASSPATH in spark-env.sh, it works.

    export SPARK_CLASSPATH=$SPARK_CLASSPATH:/local/spark-1.6.3-bin-hadoop2.6/lib/mysql-connector-java-5.1.40-bin.jar
    
    0 讨论(0)
  • 2020-11-28 09:10

    There exists a simple Java trick to solve your problem. You should specify Class.forName() instance. For example:

     val customers: RDD[(Int, String)] = new JdbcRDD(sc, () => {
           Class.forName("com.mysql.jdbc.Driver")
           DriverManager.getConnection(jdbcUrl)
          },
          "SELECT id, name from customer WHERE ? < id and id <= ?" ,
          0, range, partitions, r => (r.getInt(1), r.getString(2)))
    

    Check the docs

    0 讨论(0)
  • 2020-11-28 09:13

    spark.driver.extraClassPath does not work in client-mode:

    Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-class-path command line option or in your default properties file.

    Env variable SPARK_CLASSPATH has been deprecated in Spark 1.0+.

    You should first copy the jdbc driver jars into each executor under the same local filesystem path and then use the following options in you spark-submit:

    --driver-class-path "driver_local_file_system_jdbc_driver1.jar:driver_local_file_system_jdbc_driver2.jar"
    --class "spark.executor.extraClassPath=executors_local_file_system_jdbc_driver1.jar:executors_local_file_system_jdbc_driver2.jar"
    

    For example in case of TeraData you need both terajdbc4.jar and tdgssconfig.jar .

    Alternatively modify compute_classpath.sh on all worker nodes, Spark documentation says:

    The JDBC driver class must be visible to the primordial class loader on the client session and on all executors. This is because Java’s DriverManager class does a security check that results in it ignoring all drivers not visible to the primordial class loader when one goes to open a connection. One convenient way to do this is to modify compute_classpath.sh on all worker nodes to include your driver JARs.

    0 讨论(0)
  • 2020-11-28 09:14

    With spark 2.2.0, problem was corrected for me by adding extra class path information for SparkSession session in python script :

        spark = SparkSession \
            .builder \
            .appName("Python Spark SQL basic example") \
            .config("spark.driver.extraClassPath", "/path/to/jdbc/driver/postgresql-42.1.4.jar") \
            .getOrCreate()
    

    See official documentation https://spark.apache.org/docs/latest/configuration.html

    In my case, spark is not launched from cli command, but from django framework https://www.djangoproject.com/

    0 讨论(0)
提交回复
热议问题