How to create SparkSession from existing SparkContext

早过忘川 提交于 2019-11-30 00:27:21

问题


I have a Spark application which using Spark 2.0 new API with SparkSession. I am building this application on top of the another application which is using SparkContext. I would like to pass SparkContext to my application and initialize SparkSession using existing SparkContext.

However I could not find a way how to do that. I found that SparkSession constructor with SparkContext is private so I can't initialize it in that way and builder does not offer any setSparkContext method. Do you think there exist some workaround?


回答1:


Like in the above example you cannot create because SparkSession's constructor is private Instead you can create a SQLContext using the SparkContext, and later get the sparksession from the sqlcontext like this

val sqlContext=new SQLContext(sparkContext);
val spark=sqlContext.sparkSession

Hope this helps




回答2:


Apparently there is no way how to initialize SparkSession from existing SparkContext.




回答3:


public JavaSparkContext getSparkContext() 
{
        SparkConf conf = new SparkConf()
                    .setAppName("appName")
                    .setMaster("local[*]");
        JavaSparkContext jsc = new JavaSparkContext(conf);
        return jsc;
}


public  SparkSession getSparkSession()
{
        sparkSession= new SparkSession(getSparkContext().sc());
        return sparkSession;
}


you can also try using builder  

public SparkSession getSparkSession()
{
        SparkConf conf = new SparkConf()
                        .setAppName("appName")
                        .setMaster("local");

       SparkSession sparkSession = SparkSession
                                   .builder()
                                   .config(conf)
                                  .getOrCreate();
        return sparkSession;
}



回答4:


Deriving the SparkSession object out of SparkContext or even SparkConf is easy. Just that you might find the API to be slightly convoluted. Here's an example (I'm using Spark 2.4 but this should work in the older 2.x releases as well):

// If you already have SparkContext stored in `sc`
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

// Another example which builds a SparkConf, SparkContext and SparkSession
val conf = new SparkConf().setAppName("spark-test").setMaster("local[2]")
val sc = new SparkContext(conf)
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

Hope that helps!




回答5:


val sparkSession = SparkSession.builder.config(sc.getConf).getOrCreate()



回答6:


You would have noticed that we are using SparkSession and SparkContext, and this is not an error. Let's revisit the annals of Spark history for a perspective. It is important to understand where we came from, as you will hear about these connection objects for some time to come.

Prior to Spark 2.0.0, the three main connection objects were SparkContext, SqlContext, and HiveContext. The SparkContext object was the connection to a Spark execution environment and created RDDs and others, SQLContext worked with SparkSQL in the background of SparkContext, and HiveContext interacted with the Hive stores.

Spark 2.0.0 introduced Datasets/DataFrames as the main distributed data abstraction interface and the SparkSession object as the entry point to a Spark execution environment. Appropriately, the SparkSession object is found in the namespace, org.apache.spark.sql.SparkSession (Scala), or pyspark.sql.sparkSession. A few points to note are as follows:

In Scala and Java, Datasets form the main data abstraction as typed data; however, for Python and R (which do not have compile time type checking), the data...

https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781785889271/4/ch04lvl1sec31/sparksession-versus-sparkcontext



来源:https://stackoverflow.com/questions/42935242/how-to-create-sparksession-from-existing-sparkcontext

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!