I am working on Kafka streaming and trying to integrate it with Apache Spark. However, while running I am getting into issues. I am getting the below error.
This is the
It's not clear how you ran the code. Keep reading the blog, and you see
spark-submit \
...
--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 \
sstreaming-spark-out.py
Seems you missed adding the --packages
flag
In Jupyter, you could add this
import os
# setup arguments
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0'
# initialize spark
import pyspark
findspark.init()
Note: _2.11:2.4.0
need to align with your Scala and Spark versions
I think you need to provide absolute path of jar file of kafka, at the time of spark-submit
command, like below:
./bin/spark-submit --jars /path/to/spark-streaming-kafka-0-8-assembly_2.11-2.0.0.jar
You can download jar file from here. For detail information, refere this.