问题
To submit a Spark application to a cluster, their documentation notes:
To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark and Hadoop as provided dependencies; these need not be bundled since they are provided by the cluster manager at runtime. -- http://spark.apache.org/docs/latest/submitting-applications.html
So, I added the Apache Maven Shade Plugin to my pom.xml
file. (version 3.0.0)
And I turned my Spark dependency's scope into provided
. (version 2.1.0)
(I also added the Apache Maven Assembly Plugin to ensure I was wrapping all of my dependencies in the jar when I run mvn clean package
. I'm unsure if it's truly necessary.)
Thus is how spark-submit
fails. It throws a NoSuchMethodError
for a dependency I have (note that the code works from a local instance when compiling inside IntelliJ, assuming that provided
is removed).
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
The line of code that throws the error is irrelevant--it's simply the first line in my main method that creates a Stopwatch
, part of the Google Guava utilities. (version 21.0)
Other solutions online suggest that it has to do with version conflicts of Guava, but I haven't had any luck yet with those suggestions. Any help would be appreciated, thank you.
回答1:
If you take a look at the /jars
subdirectory of the Spark 2.1.0 installation, you will likely see guava-14.0.1.jar
. Per the API for the Guava Stopwatch#createStarted method you are using, createStarted
did not exist until Guava 15.0. What is most likely happening is that the Spark process Classloader is finding the Spark-provided Guava 14.0.1 library before it finds the Guava 21.0 library packaged in your uberjar.
One possible resolution is to use the class-relocation feature provided by the Maven Shade plugin (which you're already using to construct your uberjar). Via "class relocation", Maven-Shade moves the Guava 21.0 classes (needed by your code) during the packaging of the uberjar from a pattern
location reflecting their existing package name (e.g. com.google.common.base
) to an arbitrary shadedPattern
location, which you specify in the Shade configuration (e.g. myguava123.com.google.common.base
).
The result is that the older and newer Guava libraries no longer share a package name, avoiding the runtime conflict.
回答2:
Most likely you're having a dependency conflict, yes.
First you can look if you have a dependency conflict when you build your jar. A quick way is to look in your jar directly to see if the Stopwatch.class file is there, and if, by looking at the bytecode, it appears that the method createStarted is there. Otherwise you can also list the dependency tree and work from there : https://maven.apache.org/plugins/maven-dependency-plugin/examples/resolving-conflicts-using-the-dependency-tree.html
If it's not an issue with your jar, you might have a dependency issue due to a conflict between your spark installation and your jar. Look in the lib and jars folder of your spark installation. There you can see if you have jars that include an alternate version of guava that wouldnt support the method createStarted() from Stopwatch
回答3:
Apply above answers to solve the problem by following config:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>com.google.common</pattern>
<shadedPattern>shade.com.google.common</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.thirdparty.publicsuffix</pattern>
<shadedPattern>shade.com.google.thirdparty.publicsuffix</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
来源:https://stackoverflow.com/questions/42398720/apache-spark-using-spark-submit-throws-a-nosuchmethoderror