问题
I have a requirement to push logs created from pyspark script to kafka. Iam doing POC so using Kafka binaries in windows machine. My versions are - kafka - 2.4.0, spark - 3.0 and python - 3.8.1. I am using pycharm editor.
import sys
import logging
from datetime import datetime
try:
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
except ImportError as e:
print("Error importing Spark Modules :", e)
sys.exit(1)
Getting error
Error importing Spark Modules : No module named 'pyspark.streaming.kafka'
What is that I am missing here? Any library is missing? pyspark and spark streaming is working fine. I would appreciate if someone can provide some guidance here.
回答1:
Spark Streaming was deprecated as of Spark 2.4.
You should be using Structured Streaming instead via pyspark.sql
modules
回答2:
Issue was with the versions I was using for python and spark. I was using python 3.8 which doesn't support pyspark completely. I changed version to 3.7. Also spark 3 is still in preview, changed that to 2.4.5., it worked.
来源:https://stackoverflow.com/questions/60187069/getting-error-importing-spark-modules-no-module-named-pyspark-streaming-kaf