Spark Streaming DStream.reduceByKeyAndWindow doesn't work

会有一股神秘感。 提交于 2019-12-21 23:01:39

问题


I am using Apache Spark streaming to do some real-time processing of my web service API logs. The source stream is just a series of API calls with return code. And my Spark app is mainly doing aggregation over the raw API call logs, counting how many API returning certain HTTP code.

The batch interval on the source stream is 1 seconds. Then I do :

inputStream.reduceByKey(_ + _) where inputStream is of type DStream[(String, Int)].

And now I get the result DStream level1. Then I do reduceByKeyAndWindow on level1 over 60 seconds by calling

val level2 = level1.reduceByKeyAndWindow((a: Int, b: Int) => a + b, Seconds(60), Seconds(60)) 

Then I want to do further aggregation (say level 3) over longer period (say 3600 seconds) on top of DStream level2 by calling

val level3 = level2.reduceByKeyAndWindow((a: Int, b: Int) => a + b, Seconds(3600), Seconds(3600)) 

My problem now is: I only get aggregated data on level2, while level3 is empty.

My understanding is that level3 should not be empty and it should aggregate over level 2 stream.

Of course I can change to let level3 aggregate over level1, instead of level2. But I don't understand why it is not working by aggregating over level2.

It seems to me that you can only do one layer of reduceByKeyAndWindow on the source stream. Any further layers of reduceByKeyAndWindow on top of previous streams reduced by key and window won't work.

Any ideas?


回答1:


Yes, I think it should be a bug in Spark Streaming. Seems the Window operation of windowed stream does not work. Now I'm also investigating the reason. Will keep updated for any findings.

Similar Question: indows of windowed streams not displaying the expected results



来源:https://stackoverflow.com/questions/29961925/spark-streaming-dstream-reducebykeyandwindow-doesnt-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!