Pyspark Failed to find data source: kafka

前端未结

关注

 2  1065

I am working on Kafka streaming and trying to integrate it with Apache Spark. However, while running I am getting into issues. I am getting the below error.

This is the

相关标签:

2条回答

太阳男子

2021-01-21 15:20

It's not clear how you ran the code. Keep reading the blog, and you see

spark-submit \
  ...
  --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 \
  sstreaming-spark-out.py

Seems you missed adding the --packages flag

In Jupyter, you could add this

import os

# setup arguments
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0'

# initialize spark
import pyspark
findspark.init()

Note: _2.11:2.4.0 need to align with your Scala and Spark versions

0 讨论(0)

野趣味

2021-01-21 15:23
I think you need to provide absolute path of jar file of kafka, at the time of spark-submit command, like below:
```
./bin/spark-submit --jars /path/to/spark-streaming-kafka-0-8-assembly_2.11-2.0.0.jar
```
You can download jar file from here. For detail information, refere this.
0 讨论(0)
发布评论:

提交评论
- 加载中...