Pyspark Failed to find data source: kafka

前端 未结 2 1053
小蘑菇
小蘑菇 2021-01-21 14:24

I am working on Kafka streaming and trying to integrate it with Apache Spark. However, while running I am getting into issues. I am getting the below error.

This is the

相关标签:
2条回答
  • 2021-01-21 15:20

    It's not clear how you ran the code. Keep reading the blog, and you see

    spark-submit \
      ...
      --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 \
      sstreaming-spark-out.py
    

    Seems you missed adding the --packages flag

    In Jupyter, you could add this

    import os
    
    # setup arguments
    os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0'
    
    # initialize spark
    import pyspark
    findspark.init()
    

    Note: _2.11:2.4.0 need to align with your Scala and Spark versions

    0 讨论(0)
  • 2021-01-21 15:23

    I think you need to provide absolute path of jar file of kafka, at the time of spark-submit command, like below:

    ./bin/spark-submit --jars /path/to/spark-streaming-kafka-0-8-assembly_2.11-2.0.0.jar
    

    You can download jar file from here. For detail information, refere this.

    0 讨论(0)
提交回复
热议问题