How to pass data from Kafka to Spark Streaming?

前端 未结 3 953
日久生厌
日久生厌 2021-02-08 03:57

I am trying to pass data from kafka to spark streaming.

This is what I\'ve done till now:

  1. Installed both kafka and spark
相关标签:
3条回答
  • 2021-02-08 04:28

    To print a DStream, spark provides a method pprint for Python. So you'll use

    kafkastream.pprint()

    0 讨论(0)
  • 2021-02-08 04:44

    You need to submit spark-streaming-kafka-assembly_*.jar with your job:

    spark-submit --jars spark-streaming-kafka-assembly_2.10-1.5.2.jar ./spark-kafka.py 
    
    0 讨论(0)
  • 2021-02-08 04:46

    Alternatively, if you want to also specify resources to be allocated at the same time:

    spark-submit --deploy-mode cluster --master yarn --num-executors 5 --executor-cores 5 --executor-memory 20g --jars spark-streaming-kafka-assembly_2.10-1.6.0.jar ./spark-kafka.py 
    

    If you wanna run your code in a Jupyter-notebook, then this could be helpful:

    from __future__ import print_function
    import sys
    from pyspark.streaming import StreamingContext
    from pyspark import SparkContext,SparkConf
    from pyspark.streaming.kafka import KafkaUtils
    
    if __name__ == "__main__":
    
        os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars spark-streaming-kafka-assembly_2.10-1.6.0.jar pyspark-shell' #note that the "pyspark-shell" part is very important!!.
    
        #conf = SparkConf().setAppName("Kafka-Spark").setMaster("spark://127.0.0.1:7077")
        conf = SparkConf().setAppName("Kafka-Spark")
        #sc = SparkContext(appName="KafkaSpark")
        sc = SparkContext(conf=conf)
        stream=StreamingContext(sc,1)
        map1={'spark-kafka':1}
        kafkaStream = KafkaUtils.createStream(stream, 'localhost:9092', "name", map1) #tried with localhost:2181 too
    
        print("kafkastream=",kafkaStream)
        sc.stop()
    

    Note the introduction of the following line in __main__:

    os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars spark-streaming-kafka-assembly_2.10-1.6.0.jar pyspark-shell'
    

    Sources: https://github.com/jupyter/docker-stacks/issues/154

    0 讨论(0)
提交回复
热议问题