Spark Streaming reduceByKeyAndWindow for moving average calculation

不羁岁月 提交于 2019-12-24 17:58:28

问题


I need to calculate a moving average from a kinesis stream of data. I will have a sliding window size and slide as inputs and need to calculate the moving average and plot it.

I understand how to use reduceByKeyAndWindow from the docs to get a rolling sum. I understand how to get the counts per window as well. I am not clear on how to use these to get the average. Nor am I sure how to define an average calculator function in the reduceByKeyAndWindow. Any help would be appreciated.

Sample code below,

def createContext():
    sc = SparkContext(appName="PythonSparkStreaming")
    sc.setLogLevel("ERROR")
    ssc = StreamingContext(sc, 5)
    ssc.setLogLeve("ERROR")

    # Define kinesis Consumer
    kinesisStream = KinesisUtils.createStream(ssc,
                                        appName,
                                        streamName,
                                        endpointUrl,
                                        regionName,
                                        InitialPositionInStream.LATEST,
                                        10)

    # Count number of tweets in a batch
    count_this_batch = kinesisStream.count().map(lambda x: ('Count this batch: %s' % x))

    # Count by windowed time period
    count_windowed = kinesisStream.countByWindow(60, 5).map(lambda x: ('Counts total (One minute rolling count): %s' % x))

    sum_window = kafkaStream.reduceByKeyAndWindow(lambda x, y: x + y, lambda x, y: x - y, 60, 5)
    return ssc

ssc = StreamingContext.getOrCreate('/tmp/checkpoint_v06', lambda: createContext())
ssc.start()
ssc.awaitTermination()

来源:https://stackoverflow.com/questions/51838194/spark-streaming-reducebykeyandwindow-for-moving-average-calculation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!