I run spark 1.4.1 in amazom aws emr 4.0.0
For some reson spark saveAsTextFile is very slow on emr 4.0.0 in comparison to emr 3.8 (was 5 sec, now 95 sec)
Act
To solve the problem I added the following settings to mapred-site.xml as suggested by Neil Jonkers on user@spark.apache.org
<property>
<name>mapred.output.direct.EmrFileSystem</name>
<value>true</value>
</property>
<property>
<name>mapred.output.direct.NativeS3FileSystem</name>
<value>true</value>
</property>
It can be done by adding the following to aws command
classification=mapred-site,properties=[mapred.output.direct.EmrFileSystem=true,mapred.output.direct.NativeS3FileSystem=true]
or by adding the following to configuration json file
{
"Classification": "mapred-site",
"Properties": {
"mapred.output.direct.EmrFileSystem": "true",
"mapred.output.direct.NativeS3FileSystem": "true"
}
}