I have written simple kafka stream using Scala. It is working good in local. I have taken fat jar and submitted in scala cluster. I am getting class not found error after submit
Try changing your spark-streaming-kafka dependency to
"org.apache.spark" %% "spark-streaming-kafka-0-8" % "2.2.0"
build a fresh fat jar and see if this solves the problem.
final build.sbt looks like
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
"com.datastax.spark" %% "spark-cassandra-connector" % connectorVersion ,
"org.apache.spark" %% "spark-streaming-kafka-0-8" % "2.2.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "2.2.0"
)
You are missing the kafka dependency:
"org.apache.kafka" %% "kafka" % "0.10.1.0"
Add that an you will be all good.
I have my build.sbt
setup like this:
"org.apache.kafka" %% "kafka" % "0.10.1.0",
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.2.0",
"org.apache.spark" %% "spark-streaming" % "2.2.0" % "provided"