I am developing a Spark application that listens to a Kafka stream using Spark and Java.
I use kafka_2.10-0.10.2.1.
I have set various parameters for Kafka prope
As you mentioned in the comment above:
Turned out issue was with uber jar not building correctly.
That's exactly the issue. It does relate to how you assemble your Spark application and I'm worried to think you may have chosen a uber jar way. It's in my opinion a waste of your time at assemble and spark-submit time.
I'd personally prefer using --packages
command-line option that takes care of pulling down all the necessary dependencies if needed.
$ ./bin/spark-submit --help
...
--packages Comma-separated list of maven coordinates of jars to include
on the driver and executor classpaths. Will search the local
maven repo, then maven central and any additional remote
repositories given by --repositories. The format for the
coordinates should be groupId:artifactId:version.
...
That makes your life as a Spark developer easier and it's no longer you to wait till maven/sbt downloads the dependencies and assemble them together. It's done at spark-submit
time (and perhaps it's someone else's job, too! :))
You should spark-submit
as follows:
spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.1.1 ...
The reason for this extra requirement is that spark-streaming-kafka-0-10
module is not included by default in Spark's CLASSPATH (as it's considered unnecessary most of the time). By doing the above --packages
command line you trigger loading the module (with its transitive dependencies).
You should not bundle the module in your Spark application's uber jar.