Pyspark Failed to find data source: kafka

前端 未结 2 1066
小蘑菇
小蘑菇 2021-01-21 14:24

I am working on Kafka streaming and trying to integrate it with Apache Spark. However, while running I am getting into issues. I am getting the below error.

This is the

2条回答
  •  太阳男子
    2021-01-21 15:20

    It's not clear how you ran the code. Keep reading the blog, and you see

    spark-submit \
      ...
      --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 \
      sstreaming-spark-out.py
    

    Seems you missed adding the --packages flag

    In Jupyter, you could add this

    import os
    
    # setup arguments
    os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0'
    
    # initialize spark
    import pyspark
    findspark.init()
    

    Note: _2.11:2.4.0 need to align with your Scala and Spark versions

提交回复
热议问题