How to pass data from Kafka to Spark Streaming?

前端未结

关注

 3  962

日久生厌 2021-02-08 03:57

I am trying to pass data from kafka to spark streaming.

This is what I\'ve done till now:

Installed both kafka and spark

3条回答

臣服心动 (楼主)

2021-02-08 04:46

Alternatively, if you want to also specify resources to be allocated at the same time:

spark-submit --deploy-mode cluster --master yarn --num-executors 5 --executor-cores 5 --executor-memory 20g --jars spark-streaming-kafka-assembly_2.10-1.6.0.jar ./spark-kafka.py

If you wanna run your code in a Jupyter-notebook, then this could be helpful:

from __future__ import print_function
import sys
from pyspark.streaming import StreamingContext
from pyspark import SparkContext,SparkConf
from pyspark.streaming.kafka import KafkaUtils

if __name__ == "__main__":

    os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars spark-streaming-kafka-assembly_2.10-1.6.0.jar pyspark-shell' #note that the "pyspark-shell" part is very important!!.

    #conf = SparkConf().setAppName("Kafka-Spark").setMaster("spark://127.0.0.1:7077")
    conf = SparkConf().setAppName("Kafka-Spark")
    #sc = SparkContext(appName="KafkaSpark")
    sc = SparkContext(conf=conf)
    stream=StreamingContext(sc,1)
    map1={'spark-kafka':1}
    kafkaStream = KafkaUtils.createStream(stream, 'localhost:9092', "name", map1) #tried with localhost:2181 too

    print("kafkastream=",kafkaStream)
    sc.stop()

Note the introduction of the following line in __main__:

os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars spark-streaming-kafka-assembly_2.10-1.6.0.jar pyspark-shell'

Sources: https://github.com/jupyter/docker-stacks/issues/154

0 讨论(0)

查看其它3个回答