How to compile/package Spark 2.0 project with external jars and Maven

前端 未结 1 548
我在风中等你
我在风中等你 2021-01-24 09:31

Since version 2.0, Apache Spark is bundled with a folder \"jars\" full of .jar files. Obviously Maven will download all these jars when issuing:

mvn -e package
         


        
相关标签:
1条回答
  • 2021-01-24 10:13

    I am not sure whether I understand your problem, but I try to answer.

    Based on Spark Bundling Your Application’s Dependencies documentation:

    When creating assembly jars, list Spark and Hadoop as provided dependencies; these need not be bundled since they are provided by the cluster manager at runtime.

    You can set scope to provided in maven pom.xml file

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>${spark.version}</version>
        <!-- add this scope -->
        <scope>provided</scope>
    </dependency>
    

    The second think I noticed is that maven build creates empty JAR.

    [WARNING] JAR will be empty - no content was marked for inclusion!
    

    If you have any other dependencies, you should package these dependencies into final jar archive file.

    You can do something like below in pom.xml and run mvn package:

        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>2.6</version>
            <configuration>
                <!-- package with project dependencies -->
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
                <archive>
                  <manifest>
                    <mainClass>YOUR_MAIN_CLASS</mainClass>
                  </manifest>
                </archive>
                </configuration>
                <executions>
                  <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                  </execution>
                </executions>
        </plugin>
    

    Maven log should print line with building jar:

    [INFO] --- maven-assembly-plugin:2.4.1:single (make-assembly) @ dateUtils ---
    [INFO] Building jar: path/target/APPLICATION_NAME-jar-with-dependencies.jar
    

    After maven packaging phase in the target folder you should see DataFetch-1.0-SNAPSHOTjar-with-dependencies.jar and you can sumbit this jar with spark-submit

    0 讨论(0)
提交回复
热议问题