Pyspark Failed to find data source: kafka

前端未结

关注

 2  1066

小蘑菇 2021-01-21 14:24

I am working on Kafka streaming and trying to integrate it with Apache Spark. However, while running I am getting into issues. I am getting the below error.

This is the

2条回答

太阳男子 (楼主)

2021-01-21 15:20
It's not clear how you ran the code. Keep reading the blog, and you see
```
spark-submit \
  ...
  --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 \
  sstreaming-spark-out.py
```
Seems you missed adding the --packages flag

In Jupyter, you could add this
```
import os

# setup arguments
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0'

# initialize spark
import pyspark
findspark.init()
```
Note: _2.11:2.4.0 need to align with your Scala and Spark versions
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...