What is the recommended way to end a spark job inside a conditional statement?
I am doing validation on my data, and if false, I want to end the spark job gracefull
There is nothing to say that you can't call stop
in an if
statement, but there is very little reason to do so and it is probably a mistake to do so. It seems implicit in your question that you may be attempting to open multiple Spark sessions.
The Spark session is intended to be left open for the life of the program - if you try to start two you will find that Spark throws an exception and prints some background including a JIRA ticket that discusses the topic to the logs.
If you wish to run multiple Spark tasks, you may submit them to the same context. One context can run multiple tasks at once.
So You can't call any sparkContext related objects/functions for creating RDD/Dataframe or anything else. If you call the same sparksession again in the flow of program.. you should find the above Exception. For example.
` val rdd=sc.parallelize(Seq(Row("RAMA","DAS","25"),Row("smritu","ranjan","26")))
val df=spark.createDataFrame(rdd,schema)
df.show() //It works fine
if(df.select("fname").collect()(0).getAs[String]("fname")=="MAA"){
println("continue")
}
else{
spark.stop() //stopping sparkSession
println("inside Stopiing condition")
}
println("code continues")
val rdd1=sc.parallelize(Seq(Row("afdaf","DAS","56"),Row("sadfeafe","adsadaf","27")))
//Throws Exception...
val df1=spark.createDataFrame(rdd1,schema)
df1.show()
`