DStream all identical keys should be processed sequentially
问题 I have dstream of (Key,Value) type. mapped2.foreachRDD(rdd => { rdd.foreachPartition(p => { p.foreach(x => { } )}) }) I need to get assured that all items with identical keys are processed in one partition and by one core..so actually there are processed sequentially.. How to do this? Can I use GroupBykey which is inefficient? 回答1: You can use PairDStreamFunctions.combineByKey : import org.apache.spark.HashPartitioner import org.apache.spark.streaming.dstream.DStream /** * Created by Yuval