spark-streaming

Get the first elements (take function) of a DStream

拈花ヽ惹草 提交于 2020-12-13 09:34:27
问题 I look for a way to retrieve the first elements of a DStream created as: val dstream = ssc.textFileStream(args(1)).map(x => x.split(",").map(_.toDouble)) Unfortunately, there is no take function (as on RDD) on a dstream // dstream.take(2) !!! Could someone has any idea on how to do it ?! thanks 回答1: You can use transform method in the DStream object then take n elements of the input RDD and save it to a list, then filter the original RDD to be contained in this list. This will return a new

Get data from nested json in kafka stream pyspark

梦想的初衷 提交于 2020-11-29 23:59:53
问题 I have a kafka producer sending large amounts of data in the format of { '1000': { '3': { 'seq': '1', 'state': '2', 'CMD': 'XOR' } }, '1001': { '5': { 'seq': '2', 'state': '2', 'CMD': 'OR' } }, '1003': { '5': { 'seq': '3', 'state': '4', 'CMD': 'XOR' } } } .... the data I want is in the final loop: {'seq': '1', 'state': '2', 'CMD': 'XOR'} and the keys in the loops above ('1000' and '3') are variable. Please note that the above values are only for example. the original dataset is huge with lots

Get data from nested json in kafka stream pyspark

孤者浪人 提交于 2020-11-29 23:56:39
问题 I have a kafka producer sending large amounts of data in the format of { '1000': { '3': { 'seq': '1', 'state': '2', 'CMD': 'XOR' } }, '1001': { '5': { 'seq': '2', 'state': '2', 'CMD': 'OR' } }, '1003': { '5': { 'seq': '3', 'state': '4', 'CMD': 'XOR' } } } .... the data I want is in the final loop: {'seq': '1', 'state': '2', 'CMD': 'XOR'} and the keys in the loops above ('1000' and '3') are variable. Please note that the above values are only for example. the original dataset is huge with lots