发表新帖

发表新帖

How to set hadoop configuration values from pyspark

前端未结

关注

 3  1206

生来不讨喜

The Scala version of SparkContext has the property

sc.hadoopConfiguration

I have successfully used that to set Hadoop properties (in Scala)

相关标签:

3条回答

谎友^

2020-12-08 07:19
```
sc._jsc.hadoopConfiguration().set('my.mapreduce.setting', 'someVal')
```
should work
0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2020-12-08 07:20
You can set any Hadoop properties using the --conf parameter while submitting the job.
```
--conf "spark.hadoop.fs.mapr.trace=debug"
```
Source: https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L105
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2020-12-08 07:30
I looked into the PySpark source code (context.py) and there is not a direct equivalent. Instead some specific methods support sending in a map of (key,value) pairs:
```
fileLines = sc.newAPIHadoopFile('dev/*', 
'org.apache.hadoop.mapreduce.lib.input.TextInputFormat',
'org.apache.hadoop.io.LongWritable',
'org.apache.hadoop.io.Text',
conf={'mapreduce.input.fileinputformat.input.dir.recursive':'true'}
).count()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题