问题
I have a Spark application which using Spark 2.0 new API with SparkSession
.
I am building this application on top of the another application which is using SparkContext
. I would like to pass SparkContext
to my application and initialize SparkSession
using existing SparkContext
.
However I could not find a way how to do that. I found that SparkSession
constructor with SparkContext
is private so I can't initialize it in that way and builder does not offer any setSparkContext
method. Do you think there exist some workaround?
回答1:
Like in the above example you cannot create because SparkSession
's constructor is private
Instead you can create a SQLContext
using the SparkContext
, and later get the sparksession from the sqlcontext like this
val sqlContext=new SQLContext(sparkContext);
val spark=sqlContext.sparkSession
Hope this helps
回答2:
Apparently there is no way how to initialize SparkSession
from existing SparkContext
.
回答3:
public JavaSparkContext getSparkContext()
{
SparkConf conf = new SparkConf()
.setAppName("appName")
.setMaster("local[*]");
JavaSparkContext jsc = new JavaSparkContext(conf);
return jsc;
}
public SparkSession getSparkSession()
{
sparkSession= new SparkSession(getSparkContext().sc());
return sparkSession;
}
you can also try using builder
public SparkSession getSparkSession()
{
SparkConf conf = new SparkConf()
.setAppName("appName")
.setMaster("local");
SparkSession sparkSession = SparkSession
.builder()
.config(conf)
.getOrCreate();
return sparkSession;
}
回答4:
Deriving the SparkSession
object out of SparkContext
or even SparkConf
is easy. Just that you might find the API to be slightly convoluted. Here's an example (I'm using Spark 2.4
but this should work in the older 2.x
releases as well):
// If you already have SparkContext stored in `sc`
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
// Another example which builds a SparkConf, SparkContext and SparkSession
val conf = new SparkConf().setAppName("spark-test").setMaster("local[2]")
val sc = new SparkContext(conf)
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
Hope that helps!
回答5:
val sparkSession = SparkSession.builder.config(sc.getConf).getOrCreate()
回答6:
You would have noticed that we are using SparkSession and SparkContext, and this is not an error. Let's revisit the annals of Spark history for a perspective. It is important to understand where we came from, as you will hear about these connection objects for some time to come.
Prior to Spark 2.0.0, the three main connection objects were SparkContext, SqlContext, and HiveContext. The SparkContext object was the connection to a Spark execution environment and created RDDs and others, SQLContext worked with SparkSQL in the background of SparkContext, and HiveContext interacted with the Hive stores.
Spark 2.0.0 introduced Datasets/DataFrames as the main distributed data abstraction interface and the SparkSession object as the entry point to a Spark execution environment. Appropriately, the SparkSession object is found in the namespace, org.apache.spark.sql.SparkSession (Scala), or pyspark.sql.sparkSession. A few points to note are as follows:
In Scala and Java, Datasets form the main data abstraction as typed data; however, for Python and R (which do not have compile time type checking), the data...
https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781785889271/4/ch04lvl1sec31/sparksession-versus-sparkcontext
来源:https://stackoverflow.com/questions/42935242/how-to-create-sparksession-from-existing-sparkcontext