Avoid write files for empty partitions in Spark Streaming
问题 I have Spark Streaming job which reads data from kafka partitions (one executor per partition). I need to save transformed values to HDFS, but need to avoid empty files creation. I tried to use isEmpty, but this doesn't help when not all partitions are empty. P.S. repartition is not an acceptable solution due to perfomance degradation. 回答1: The code works for PairRDD only. Code for text: val conf = ssc.sparkContext.hadoopConfiguration conf.setClass("mapreduce.output.lazyoutputformat