By default, newer versions of Spark use compression when saving text files. For example:
val txt = sc.parallelize(List(\"Hello\", \"world\", \"!\"))
txt.saveAsT
I can see the text file in HDFS without any compression with this code.
val conf = new SparkConf().setMaster("local").setAppName("App name")
val sc = new SparkContext(conf);
sc.hadoopConfiguration.set("mapred.output.compress", "false")
val txt = sc.parallelize(List("Hello", "world", "!"))
txt.saveAsTextFile("hdfs/path/to/save/file")
You can set all Hadoop related properties to hadoopConfiguration
on sc
.
Verified this code in Spark 1.5.2(scala 2.11).