问题
ALready Looked at Spark streaming not remembering previous state but doesn't help. Also looked at http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing but cant find JavaStreamingContextFactory although I am using spark streaming 2.11 v 2.0.1
My code works fine but when I restart it... it won't remember the last checkpoint...
Function0<JavaStreamingContext> scFunction = new Function0<JavaStreamingContext>() {
@Override
public JavaStreamingContext call() throws Exception {
//Spark Streaming needs to checkpoint enough information to a fault- tolerant storage system such
JavaStreamingContext ssc = new JavaStreamingContext(conf, Durations.milliseconds(SPARK_DURATION));
//checkpointDir = "hdfs://user:pw@192.168.1.50:54310/spark/checkpoint";
ssc.sparkContext().setCheckpointDir(checkpointDir);
StorageLevel.MEMORY_AND_DISK();
return ssc;
}
};
JavaStreamingContext ssc = JavaStreamingContext.getOrCreate(checkpointDir, scFunction);
Currently data is streaming from kafka and I am performing some transformation and action.
JavaPairDStream<Integer, Long> responseCodeCountDStream = logObject.transformToPair
(MainApplication::responseCodeCount);
JavaPairDStream<Integer, Long> cumulativeResponseCodeCountDStream = responseCodeCountDStream.updateStateByKey
(COMPUTE_RUNNING_SUM);
cumulativeResponseCodeCountDStream.foreachRDD(rdd -> {
rdd.checkpoint();
LOG.warn("Response code counts: " + rdd.take(100));
});
Could somebody point me to right direction, if I am missing something?
Also, I can see that checkpoint is being saved in hdfs. But why wont it read from it?
来源:https://stackoverflow.com/questions/40535839/spark-checkpoint-doesnt-remember-state-java-hdfs