Spark-Submit: --packages vs --jars

前端 未结 1 929
一整个雨季
一整个雨季 2021-01-04 14:27

Can someone explain the differences between --packages and --jars in a spark-submit script?

nohup ./bin/spark-submit   --jars ./xxx         


        
相关标签:
1条回答
  • 2021-01-04 14:41

    if you do spark-submit --help it will show:

    --jars JARS                 Comma-separated list of jars to include on the driver
                                  and executor classpaths.
    
    --packages                  Comma-separated list of maven coordinates of jars to include
                                  on the driver and executor classpaths. Will search the local
                                  maven repo, then maven central and any additional remote
                                  repositories given by --repositories. The format for the
                                  coordinates should be groupId:artifactId:version.
    

    if it is --jars

    then spark doesn't hit maven but it will search specified jar in the local file system it also supports following URL scheme hdfs/http/https/ftp.

    so if it is --packages

    then spark will search specific package in local maven repo then central maven repo or any repo provided by --repositories and then download it.

    Now Coming back to your questions:

    Also, do I require the--packages configuration if the dependency is in my applications pom.xml?

    Ans: No, If you are not importing/using classes in jar directly but need to load classes by some class loader or service loader (e.g. JDBC Drivers). Yes otherwise.

    BTW, If you are using specific version of specific jar in your pom.xml then why dont you make uber/fat jar of your application or provide dependency jar in --jars argument ? instead of using --packages

    links to refer:

    spark advanced-dependency-management

    add-jars-to-a-spark-job-spark-submit

    0 讨论(0)
提交回复
热议问题