Why does Spark application fail with “Exception in thread ”main“ java.lang.NoClassDefFoundError: …StringDeserializer”?

寵の児 提交于 2019-12-13 17:12:02

问题


I am developing a Spark application that listens to a Kafka stream using Spark and Java.

I use kafka_2.10-0.10.2.1.

I have set various parameters for Kafka properties: bootstrap.servers, key.deserializer, value.deserializer, etc.

My application compiles fine, but when I submit it, it fails with the following error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/StringDeserializer

I do use StringDeserializer for key.deserializer and value.deserializer so it's indeed related to how I wrote my application.

Various maven dependencies used in pom.xml:

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.10</artifactId>
        <version>2.1.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>

I have tried updating the version of spark streaming/kafka. I could not find much anywhere.


回答1:


spark-streaming_2.10

This is dependent upon Scala 2.10

Your other dependencies are using Scala 2.11

Upgrading the version is the correct solution for the current error.

And make sure that within streaming-kafka-0-10, this matches the version of Kafka you're running

Application is compiling fine but when I am trying to submit the spark job, its showing error: Exception in thread "main" java.lang.NoClassDefFoundError:

By default, maven does not include dependency jars when it builds a target




回答2:


As you mentioned in the comment above:

Turned out issue was with uber jar not building correctly.

That's exactly the issue. It does relate to how you assemble your Spark application and I'm worried to think you may have chosen a uber jar way. It's in my opinion a waste of your time at assemble and spark-submit time.

I'd personally prefer using --packages command-line option that takes care of pulling down all the necessary dependencies if needed.

$ ./bin/spark-submit --help
...
  --packages                  Comma-separated list of maven coordinates of jars to include
                              on the driver and executor classpaths. Will search the local
                              maven repo, then maven central and any additional remote
                              repositories given by --repositories. The format for the
                              coordinates should be groupId:artifactId:version.
...

That makes your life as a Spark developer easier and it's no longer you to wait till maven/sbt downloads the dependencies and assemble them together. It's done at spark-submit time (and perhaps it's someone else's job, too! :))

You should spark-submit as follows:

spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.1.1 ...

The reason for this extra requirement is that spark-streaming-kafka-0-10 module is not included by default in Spark's CLASSPATH (as it's considered unnecessary most of the time). By doing the above --packages command line you trigger loading the module (with its transitive dependencies).

You should not bundle the module in your Spark application's uber jar.



来源:https://stackoverflow.com/questions/44748924/why-does-spark-application-fail-with-exception-in-thread-main-java-lang-nocla

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!