dstream

Not able to persist the DStream for use in next batch

依然范特西╮ 提交于 2019-12-01 12:16:17
问题 JavaRDD<String> history_ = sc.emptyRDD(); java.util.Queue<JavaRDD<String> > queue = new LinkedList<JavaRDD<String>>(); queue.add(history_); JavaDStream<String> history_dstream = ssc.queueStream(queue); JavaPairDStream<String,ArrayList<String>> history = history_dstream.mapToPair(r -> { return new Tuple2< String,ArrayList<String> >(null,null); }); JavaPairInputDStream<String, GenericData.Record> stream_1 = KafkaUtils.createDirectStream(ssc, String.class, GenericData.Record.class, StringDecoder

Not able to persist the DStream for use in next batch

余生颓废 提交于 2019-12-01 11:56:50
JavaRDD<String> history_ = sc.emptyRDD(); java.util.Queue<JavaRDD<String> > queue = new LinkedList<JavaRDD<String>>(); queue.add(history_); JavaDStream<String> history_dstream = ssc.queueStream(queue); JavaPairDStream<String,ArrayList<String>> history = history_dstream.mapToPair(r -> { return new Tuple2< String,ArrayList<String> >(null,null); }); JavaPairInputDStream<String, GenericData.Record> stream_1 = KafkaUtils.createDirectStream(ssc, String.class, GenericData.Record.class, StringDecoder.class, GenericDataRecordDecoder.class, props, topicsSet_1); JavaPairInputDStream<String, GenericData

Transformed DStream in pyspark gives error when pprint called on it

拥有回忆 提交于 2019-12-01 06:23:44
问题 I'm exploring Spark Streaming through PySpark, and hitting an error when I try to use the transform function with take . I can successfully use sortBy against the DStream via transform and pprint the result. author_counts_sorted_dstream = author_counts_dstream.transform\ (lambda foo:foo\ .sortBy(lambda x:x[0].lower())\ .sortBy(lambda x:x[1],ascending=False)) author_counts_sorted_dstream.pprint() But if I use take following the same pattern and try to pprint it: top_five = author_counts_sorted