问题
What is the purpose of the getOrCreate
method from SparkContext
class? I don't understand when we should use this method.
If I have 2 spark applications that are run with spark-submit
, and in the main method I instantiate the spark context with SparkContext.getOrCreate
, both app will have the same context?
Or the purpose is simpler, and the only purpose is when I create a spark app, and I don't want to send the spark context as a parameter to a method, and I will get it as a singleton object?
回答1:
If I have 2 spark applications that are run with spark-submit, and in the main method I instantiate the spark context with SparkContext.getOrCreate, both app will have the same context?
No, SparkContext is a local object. It is not shared between applications.
when I create a spark app, and I don't want to send the spark context as a parameter to a method, and I will get it as a singleton object?
This is exactly the reason. SparkContext
(or SparkSession
) are ubiquitous in Spark applications and core Spark's source, and passing them around would a huge burden.
It also useful for multithreaded applications where arbitrary thread can initalize contexts.
About docs:
is function may be used to get or instantiate a SparkContext and register it as a singleton object. Because we can only have one active SparkContext per JVM, this is useful when applications may wish to share a SparkContext.
Driver runs in its own JVM and there is no built-in mechanism to share it between multiple full-fledged Java applications (proper application executing its own main
. Check Is there one JVM per Java application? and Why have one JVM per application? for related general questions). Application refers to "logical application" where multiple modules execute its own code - one example is SparkJob
on spark-jobserver
. This scenario is no different than passing SparkContext
to a function.
来源:https://stackoverflow.com/questions/47813646/sparkcontext-getorcreate-purpose