how to set and get static variables from spark?

元气小坏坏 提交于 2019-12-29 06:22:53

问题


I have a class as this:

public class Test {
    private static String name;

    public static String getName() {
        return name;
    }

    public static void setName(String name) {
        Test.name = name;
    }

    public static void print() {
        System.out.println(name);
    }

}

in my Spark driver, I'm setting the name like this and calling the print() command:

public final class TestDriver{

    public static void main(String[] args) throws Exception {
        SparkConf sparkConf = new SparkConf().setAppName("TestApp");
        // ...
        // ...
        Test.setName("TestName")
        Test.print();
        // ...
    }
}

However, I'm getting a NullPointerException. How do I pass a value to the global variable and use it?


回答1:


Ok, there is basically 2 ways to take a value known to the master to the executors:

  1. Put the value inside a closure to be serialized to the executors to perform a task. This is the most common one and very simple/elegant. Sample and doc here.
  2. Create a broadcast variable with the data. This is good for immutable data of a big size, so you want to guarantee it is send only once. Also good if the same data is used over and over. Sample and doc here.

No need to use static variables in either case. But, if you DO want to have static values available on your executor VMs, you need to do one of these:

  1. If the values are fixed or the configuration is available on the executor nodes (lives inside the jar, etc), then you can have a lazy val, guaranteeing initialization only once.
  2. You can call mapPartitions() with code that uses one of the 2 options above, then store the values on your static variable/object. mapPartitions is guaranteed to run only once for each partition (much better than once per line) and is good for this kind of thing (initializing DB connections, etc).

Hope this helps!

P.S: As for you exception: I just don't see it on that code sample, my bet is that it is occurring elsewhere.


Edit for extra clarification: The lazy val solution is simply Scala, no Spark involved...

object MyStaticObject
{
  lazy val MyStaticValue = {
     // Call a database, read a file included in the Jar, do expensive initialization computation, etc
     4
  }
} 

Since each Executor corresponds to a JVM, once the classes are loaded MyStaticObject will be initialized. The lazy keyword guarantees that the MyStaticValue variable will only be initialized the first time it is actually requested, and hold its value ever since.




回答2:


The copy of your class in your driver process isn't the copy in your executors. They aren't in the same ClassLoader, or even the same JVM, or even on the same machine. Setting a static variable on the driver does nothing to the other copies, hence you find it null remotely.



来源:https://stackoverflow.com/questions/29685330/how-to-set-and-get-static-variables-from-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!