How should you end a Spark job inside an if statement?

后端 未结 2 838
青春惊慌失措
青春惊慌失措 2021-01-16 19:04

What is the recommended way to end a spark job inside a conditional statement?

I am doing validation on my data, and if false, I want to end the spark job gracefull

相关标签:
2条回答
  • 2021-01-16 19:26

    There is nothing to say that you can't call stop in an if statement, but there is very little reason to do so and it is probably a mistake to do so. It seems implicit in your question that you may be attempting to open multiple Spark sessions.

    The Spark session is intended to be left open for the life of the program - if you try to start two you will find that Spark throws an exception and prints some background including a JIRA ticket that discusses the topic to the logs.

    If you wish to run multiple Spark tasks, you may submit them to the same context. One context can run multiple tasks at once.

    0 讨论(0)
  • 2021-01-16 19:27

    Once you stop the SparkSession means, your SparkContext is killed on the JVM. sc is no longer active now.

    So You can't call any sparkContext related objects/functions for creating RDD/Dataframe or anything else. If you call the same sparksession again in the flow of program.. you should find the above Exception. For example.

    `    val rdd=sc.parallelize(Seq(Row("RAMA","DAS","25"),Row("smritu","ranjan","26")))
        val df=spark.createDataFrame(rdd,schema)
        df.show()   //It works fine
        if(df.select("fname").collect()(0).getAs[String]("fname")=="MAA"){
        println("continue")
        }
        else{
        spark.stop()   //stopping sparkSession
        println("inside Stopiing condition")
        }
        println("code continues")
        val rdd1=sc.parallelize(Seq(Row("afdaf","DAS","56"),Row("sadfeafe","adsadaf","27")))
        //Throws Exception...
        val df1=spark.createDataFrame(rdd1,schema)
        df1.show()
    `
    
    0 讨论(0)
提交回复
热议问题