I wrote spark streaming program with stateful transformation. It seems like my spark streaming application is doing computation correctly with check pointing. But if i terminate
As it is described in the checkpointing documentation you have to adjust your code to be able to restore state from the checkpoints.
In particular you cannot create StreamingContext
directly but have to use StreamingContext.getOrCreate method which takes:
Unit => StreamingContext
)According to spark-streaming documentation you should initialize context a bit differently:
// Function to create and setup a new StreamingContext
def functionToCreateContext(): StreamingContext = {
val ssc = new StreamingContext(...) // new context
val lines = ssc.socketTextStream(...) // create DStreams
...
ssc.checkpoint(checkpointDirectory) // set checkpoint directory
ssc
}
// Get StreamingContext from checkpoint data or create a new one
val context = StreamingContext.getOrCreate(checkpointDirectory, functionToCreateContext _)
// Do additional setup on context that needs to be done,
// irrespective of whether it is being started or restarted
context. ...
// Start the context
context.start()
context.awaitTermination()
see checkpointing