Getting : Error importing Spark Modules : No module named 'pyspark.streaming.kafka'

别来无恙 提交于 2020-08-08 20:18:08

问题


I have a requirement to push logs created from pyspark script to kafka. Iam doing POC so using Kafka binaries in windows machine. My versions are - kafka - 2.4.0, spark - 3.0 and python - 3.8.1. I am using pycharm editor.

import sys
import logging
from datetime import datetime

try:
    from pyspark import SparkContext
    from pyspark.streaming import StreamingContext
    from pyspark.streaming.kafka import KafkaUtils

except ImportError as e:
    print("Error importing Spark Modules :", e)
    sys.exit(1)

Getting error

Error importing Spark Modules : No module named 'pyspark.streaming.kafka'

What is that I am missing here? Any library is missing? pyspark and spark streaming is working fine. I would appreciate if someone can provide some guidance here.


回答1:


Spark Streaming was deprecated as of Spark 2.4.

You should be using Structured Streaming instead via pyspark.sql modules




回答2:


Issue was with the versions I was using for python and spark. I was using python 3.8 which doesn't support pyspark completely. I changed version to 3.7. Also spark 3 is still in preview, changed that to 2.4.5., it worked.



来源:https://stackoverflow.com/questions/60187069/getting-error-importing-spark-modules-no-module-named-pyspark-streaming-kaf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!