How to set hadoop configuration values from pyspark

前端 未结 3 1206
生来不讨喜
生来不讨喜 2020-12-08 06:58

The Scala version of SparkContext has the property

sc.hadoopConfiguration

I have successfully used that to set Hadoop properties (in Scala)

相关标签:
3条回答
  • 2020-12-08 07:19
    sc._jsc.hadoopConfiguration().set('my.mapreduce.setting', 'someVal')
    

    should work

    0 讨论(0)
  • 2020-12-08 07:20

    You can set any Hadoop properties using the --conf parameter while submitting the job.

    --conf "spark.hadoop.fs.mapr.trace=debug"
    

    Source: https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L105

    0 讨论(0)
  • 2020-12-08 07:30

    I looked into the PySpark source code (context.py) and there is not a direct equivalent. Instead some specific methods support sending in a map of (key,value) pairs:

    fileLines = sc.newAPIHadoopFile('dev/*', 
    'org.apache.hadoop.mapreduce.lib.input.TextInputFormat',
    'org.apache.hadoop.io.LongWritable',
    'org.apache.hadoop.io.Text',
    conf={'mapreduce.input.fileinputformat.input.dir.recursive':'true'}
    ).count()
    
    0 讨论(0)
提交回复
热议问题