spark streamingRDD队列流
用streamingContext.queueStream(queueOfRDD)创建基于RDD的Dstream 每隔1s创建一个RDD,加到队列里,每隔2s对Dstream进行处理 cd 。。。。 vim RDDQueueStream.py #!/usr/bin/env python3 import time from pyspark import SparkContext from spark.streaming import StreamingContext if__name__=“ main ”: sc = SparkContext(appName=‘PythonStreamingQueueStream’) ssc =StreamingContext(sc,2) #下面创建一个RDD队列流加了5次 rddQueue = [] for i in range(5): rddQueue += [ssc.saprkContext.parallelize([j for j in range(1,1001)],10)]#10是分区,每次生成一千个元素 time.sleep(1)#每隔1s筛一个RDD队列 Input = ssc.queueStream(rddQueue) mappedStream = input.map(lamda x:(x%10,1)) reducedStream =