How to add third party java jars for use in pyspark

前端 未结 9 1806
没有蜡笔的小新
没有蜡笔的小新 2020-11-29 03:08

I have some third party Database client libraries in Java. I want to access them through

java_gateway.py

E.g: to make the client class (not

相关标签:
9条回答
  • 2020-11-29 03:38

    I've worked around this by dropping the jars into a directory drivers and then creating a spark-defaults.conf file in conf folder. Steps to follow;

    To get the conf path:  
    cd ${SPARK_HOME}/conf
    
    vi spark-defaults.conf  
    spark.driver.extraClassPath /Users/xxx/Documents/spark_project/drivers/*
    

    run your Jupyter notebook.

    0 讨论(0)
  • 2020-11-29 03:41
    1. Extract the downloaded jar file.
    2. Edit system environment variable
      • Add a variable named SPARK_CLASSPATH and set its value to \path\to\the\extracted\jar\file.

    Eg: you have extracted the jar file in C drive in folder named sparkts its value should be: C:\sparkts

    1. Restart your cluster
    0 讨论(0)
  • 2020-11-29 03:44

    Apart from the accepted answer, you also have below options:

    1. if you are in virtual environment then you can place it in

      e.g. lib/python3.7/site-packages/pyspark/jars

    2. if you want java to discover it then you can place where your jre is installed under ext/ directory

    0 讨论(0)
提交回复
热议问题