Is it possible to get the current spark context settings in PySpark?

前端 未结 13 1122
孤街浪徒
孤街浪徒 2021-01-29 22:05

I\'m trying to get the path to spark.worker.dir for the current sparkcontext.

If I explicitly set it as a config param, I can read

相关标签:
13条回答
  • 2021-01-29 22:22

    Unfortunately, no, the Spark platform as of version 2.3.1 does not provide any way to programmatically access the value of every property at run time. It provides several methods to access the values of properties that were explicitly set through a configuration file (like spark-defaults.conf), set through the SparkConf object when you created the session, or set through the command line when you submitted the job, but none of these methods will show the default value for a property that was not explicitly set. For completeness, the best options are:

    • The Spark application’s web UI, usually at http://<driver>:4040, has an “Environment” tab with a property value table.
    • The SparkContext keeps a hidden reference to its configuration in PySpark, and the configuration provides a getAll method: spark.sparkContext._conf.getAll().
    • Spark SQL provides the SET command that will return a table of property values: spark.sql("SET").toPandas(). You can also use SET -v to include a column with the property’s description.

    (These three methods all return the same data on my cluster.)

    0 讨论(0)
  • 2021-01-29 22:23

    Not sure if you can get all the default settings easily, but specifically for the worker dir, it's quite straigt-forward:

    from pyspark import SparkFiles
    print SparkFiles.getRootDirectory()
    
    0 讨论(0)
  • 2021-01-29 22:23

    If you want to see the configuration in data bricks use the below command

    spark.sparkContext._conf.getAll()
    
    0 讨论(0)
  • 2021-01-29 22:25

    You can use:

    sc.sparkContext.getConf.getAll
    

    For example, I often have the following at the top of my Spark programs:

    logger.info(sc.sparkContext.getConf.getAll.mkString("\n"))
    
    0 讨论(0)
  • 2021-01-29 22:27

    Suppose I want to increase the driver memory in runtime using Spark Session:

    s2 = SparkSession.builder.config("spark.driver.memory", "29g").getOrCreate()
    

    Now I want to view the updated settings:

    s2.conf.get("spark.driver.memory")
    

    To get all the settings, you can make use of spark.sparkContext._conf.getAll()

    Hope this helps

    0 讨论(0)
  • 2021-01-29 22:28

    Just for the records the analogous java version:

    Tuple2<String, String> sc[] = sparkConf.getAll();
    for (int i = 0; i < sc.length; i++) {
        System.out.println(sc[i]);
    }
    
    0 讨论(0)
提交回复
热议问题