how to properly use pyspark to send data to kafka broker?

前端 未结 1 1416
一整个雨季
一整个雨季 2021-02-03 13:14

I\'m trying to write a simple pyspark job, which would receive data from a kafka broker topic, did some transformation on that data, and put the transformed data on a different

相关标签:
1条回答
  • 2021-02-03 13:28

    Here is the correct code, which reads from Kafka into Spark, and writes spark data back to a different kafka topic:

    from pyspark import SparkConf, SparkContext
    from operator import add
    import sys
    from pyspark.streaming import StreamingContext
    from pyspark.streaming.kafka import KafkaUtils
    import json
    from kafka import SimpleProducer, KafkaClient
    from kafka import KafkaProducer
    
    producer = KafkaProducer(bootstrap_servers='localhost:9092')
    
    def handler(message):
        records = message.collect()
        for record in records:
            producer.send('spark.out', str(record))
            producer.flush()
    
    def main():
        sc = SparkContext(appName="PythonStreamingDirectKafkaWordCount")
        ssc = StreamingContext(sc, 10)
    
        brokers, topic = sys.argv[1:]
        kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers})
        kvs.foreachRDD(handler)
    
        ssc.start()
        ssc.awaitTermination()
    if __name__ == "__main__":
    
       main()
    

    The way to run this is:

    spark-submit --jars spark-streaming-kafka-assembly_2.10-1.6.1.jar s.py localhost:9092 test
    
    0 讨论(0)
提交回复
热议问题