The Scala version of SparkContext has the property
sc.hadoopConfiguration
I have successfully used that to set Hadoop properties (in Scala)
sc._jsc.hadoopConfiguration().set('my.mapreduce.setting', 'someVal')
should work
You can set any Hadoop properties using the --conf
parameter while submitting the job.
--conf "spark.hadoop.fs.mapr.trace=debug"
Source: https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L105
I looked into the PySpark source code (context.py) and there is not a direct equivalent. Instead some specific methods support sending in a map of (key,value) pairs:
fileLines = sc.newAPIHadoopFile('dev/*',
'org.apache.hadoop.mapreduce.lib.input.TextInputFormat',
'org.apache.hadoop.io.LongWritable',
'org.apache.hadoop.io.Text',
conf={'mapreduce.input.fileinputformat.input.dir.recursive':'true'}
).count()