Is it possible to get the current spark context settings in PySpark?

前端 未结 13 1123
孤街浪徒
孤街浪徒 2021-01-29 22:05

I\'m trying to get the path to spark.worker.dir for the current sparkcontext.

If I explicitly set it as a config param, I can read

相关标签:
13条回答
  • 2021-01-29 22:31

    Spark 1.6+

    sc.getConf.getAll.foreach(println)
    
    0 讨论(0)
  • 2021-01-29 22:31

    For a complete overview of your Spark environment and configuration I found the following code snippets useful:

    SparkContext:

    for item in sorted(sc._conf.getAll()): print(item)
    

    Hadoop Configuration:

    hadoopConf = {}
    iterator = sc._jsc.hadoopConfiguration().iterator()
    while iterator.hasNext():
        prop = iterator.next()
        hadoopConf[prop.getKey()] = prop.getValue()
    for item in sorted(hadoopConf.items()): print(item)
    

    Environment variables:

    import os
    for item in sorted(os.environ.items()): print(item)
    
    0 讨论(0)
  • 2021-01-29 22:31

    For Spark 2+ you can also use when using scala

    spark.conf.getAll; //spark as spark session 
    
    0 讨论(0)
  • 2021-01-29 22:33

    Yes: sc.getConf().getAll()

    Which uses the method:

    SparkConf.getAll()
    

    as accessed by

    SparkContext.sc.getConf()
    

    Note the Underscore: that makes this tricky. I had to look at the spark source code to figure it out ;)

    But it does work:

    In [4]: sc.getConf().getAll()
    Out[4]:
    [(u'spark.master', u'local'),
     (u'spark.rdd.compress', u'True'),
     (u'spark.serializer.objectStreamReset', u'100'),
     (u'spark.app.name', u'PySparkShell')]
    
    0 讨论(0)
  • 2021-01-29 22:34

    Spark 2.1+

    spark.sparkContext.getConf().getAll() where spark is your sparksession (gives you a dict with all configured settings)

    0 讨论(0)
  • 2021-01-29 22:37

    update configuration in Spark 2.3.1

    To change the default spark configurations you can follow these steps:

    Import the required classes

    from pyspark.conf import SparkConf
    from pyspark.sql import SparkSession
    

    Get the default configurations

    spark.sparkContext._conf.getAll()
    

    Update the default configurations

    conf = spark.sparkContext._conf.setAll([('spark.executor.memory', '4g'), ('spark.app.name', 'Spark Updated Conf'), ('spark.executor.cores', '4'), ('spark.cores.max', '4'), ('spark.driver.memory','4g')])
    

    Stop the current Spark Session

    spark.sparkContext.stop()
    

    Create a Spark Session

    spark = SparkSession.builder.config(conf=conf).getOrCreate()
    
    0 讨论(0)
提交回复
热议问题