Spark : get Multiple DStream out of a single DStream

房东的猫 提交于 2019-12-12 02:57:12

问题


Is is possible to get multiple DStream out of a single DStream in spark. My use case is follows: I am getting Stream of log data from HDFS file. The log line contains an id (id=xyz). I need to process log line differently based on the id. So I was trying to different Dstream for each id from input Dstream. I couldnt find anything related in documentation. Does anyone know how this can be achieved in Spark or point to any link for this.

Thanks


回答1:


You cannot Split multiple DStreams from Single DStreams. The best you can do is: -

  1. Modify your source system to have different streams for different ID's and then you can have different jobs to process different Streams
  2. In case your source cannot change and provide you stream which is mix of ID, then you need to write custom logic to identify the ID and then perform the appropriate operation.

I would always prefer #1 as that is cleaner solution but there are exceptions for which #2 needs to be implemented.



来源:https://stackoverflow.com/questions/34897236/spark-get-multiple-dstream-out-of-a-single-dstream

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!