Running app jar file on spark-submit in a google dataproc cluster instance

旧城冷巷雨未停 提交于 2019-12-01 20:34:31

问题


I'm running a .jar file that contains all dependencies that I need packaged in it. One of this dependencies is com.google.common.util.concurrent.RateLimiter and already checked it's class file is in this .jar file.

Unfortunately when I hit the command spark-submit on the master node of my google's dataproc-cluster instance I'm getting this error:

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
at com.google.common.util.concurrent.RateLimiter$SleepingStopwatch$1.<init>(RateLimiter.java:417)
at com.google.common.util.concurrent.RateLimiter$SleepingStopwatch.createFromSystemTimer(RateLimiter.java:416)
at com.google.common.util.concurrent.RateLimiter.create(RateLimiter.java:130)
at LabeledAddressDatasetBuilder.publishLabeledAddressesFromBlockstem(LabeledAddressDatasetBuilder.java:60)
at LabeledAddressDatasetBuilder.main(LabeledAddressDatasetBuilder.java:144)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

It seems something happened in the sense of overwriting my dependencies. Already decompiled the Stopwatch.class file from this .jar and checked that method is there. That just happened when I ran on that google dataproc instance. I did grep on the process executing the spark-submit and I got the flag -cp like this:

/usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /usr/lib/spark/conf/:/usr/lib/spark/lib/spark-assembly-1.5.0-hadoop2.7.1.jar:/usr/lib/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/lib/spark/lib/datanucleus-core-3.2.10.jar:/etc/hadoop/conf/:/etc/hadoop/conf/:/usr/lib/hadoop/lib/native/:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*

Is there anything I can do to solve this problem?

Thank you.


回答1:


As you've found, Dataproc includes Hadoop dependencies on the classpath when invoking Spark. This is done primarily so that using Hadoop input formats, file systems, etc is fairly straight-forward. The downside is that you will end up with Hadoop's guava version which is 11.02 (See HADOOP-10101).

How to work around this depends on your build system. If using Maven, the maven-shade plugin can be used to relocate your version of guava under a new package name. An example of this can be seen in the GCS Hadoop Connector's packaging, but the crux of it is the following plugin declaration in your pom.xml build section:

  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>2.3</version>
    <executions>
      <execution>
        <phase>package</phase>
        <goals>
          <goal>shade</goal>
        </goals>
        <configuration>
          <relocations>
            <relocation>
              <pattern>com.google.common</pattern>
              <shadedPattern>your.repackaged.deps.com.google.common</shadedPattern>
            </relocation>
          </relocations>
        </execution>
      </execution>
    </plugin>

Similar relocations can be accomplished with the sbt-assembly plugin for sbt, jarjar for ant, and either jarjar or shadow for gradle.



来源:https://stackoverflow.com/questions/33922719/running-app-jar-file-on-spark-submit-in-a-google-dataproc-cluster-instance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!