How to set and get static variables from spark?

前端 未结 4 1306
孤独总比滥情好
孤独总比滥情好 2020-12-06 07:09

I have a class as this:

public class Test {
    private static String name;

    public static String getName() {
        return name;
    }

    public stati         


        
相关标签:
4条回答
  • 2020-12-06 07:23

    The copy of your class in your driver process isn't the copy in your executors. They aren't in the same ClassLoader, or even the same JVM, or even on the same machine. Setting a static variable on the driver does nothing to the other copies, hence you find it null remotely.

    0 讨论(0)
  • 2020-12-06 07:27

    I would like to add one more point into DanielL's Answer

    When declare a variable with static keyword the JVM loads it during the class loading so if you create a jar and set initial values of static fields in a Java /scala class are stored in the jar, workers can use it directly. However if you change the value of a static field in the driver program, workers can only see the initial value assigned into Jar and your changed value will not reflect , so you need to copy again new jar or need to copy class manually into all executors .

    0 讨论(0)
  • 2020-12-06 07:35

    Ok, there is basically 2 ways to take a value known to the master to the executors:

    1. Put the value inside a closure to be serialized to the executors to perform a task. This is the most common one and very simple/elegant. Sample and doc here.
    2. Create a broadcast variable with the data. This is good for immutable data of a big size, so you want to guarantee it is send only once. Also good if the same data is used over and over. Sample and doc here.

    No need to use static variables in either case. But, if you DO want to have static values available on your executor VMs, you need to do one of these:

    1. If the values are fixed or the configuration is available on the executor nodes (lives inside the jar, etc), then you can have a lazy val, guaranteeing initialization only once.
    2. You can call mapPartitions() with code that uses one of the 2 options above, then store the values on your static variable/object. mapPartitions is guaranteed to run only once for each partition (much better than once per line) and is good for this kind of thing (initializing DB connections, etc).

    Hope this helps!

    P.S: As for you exception: I just don't see it on that code sample, my bet is that it is occurring elsewhere.


    Edit for extra clarification: The lazy val solution is simply Scala, no Spark involved...

    object MyStaticObject
    {
      lazy val MyStaticValue = {
         // Call a database, read a file included in the Jar, do expensive initialization computation, etc
         4
      }
    } 
    

    Since each Executor corresponds to a JVM, once the classes are loaded MyStaticObject will be initialized. The lazy keyword guarantees that the MyStaticValue variable will only be initialized the first time it is actually requested, and hold its value ever since.

    0 讨论(0)
  • 2020-12-06 07:35

    I would like to add one more approach this makes sense only when if you have a few variables which cab ne passed in runtime as arguments.

    spark Configuration --> --conf "spark.executor.extraJavaOptions=-DcutomField=${value}" and when you need data in transformations you can call System.getProperty("cutomField");

    you can find more details here

    Note: above discussed does not make sense when we have a significant number of variables . in those cases, I would prefer @Daniel Langdon approaches.

    0 讨论(0)
提交回复
热议问题