Spark Checkpoint doesn't remember state (Java HDFS)

女生的网名这么多〃 提交于 2019-12-11 05:15:23

问题


ALready Looked at Spark streaming not remembering previous state but doesn't help. Also looked at http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing but cant find JavaStreamingContextFactory although I am using spark streaming 2.11 v 2.0.1

My code works fine but when I restart it... it won't remember the last checkpoint...

Function0<JavaStreamingContext> scFunction = new Function0<JavaStreamingContext>() {
        @Override
        public JavaStreamingContext call() throws Exception {
            //Spark Streaming needs to checkpoint enough information to a fault- tolerant storage system such
            JavaStreamingContext ssc = new JavaStreamingContext(conf, Durations.milliseconds(SPARK_DURATION));
          //checkpointDir = "hdfs://user:pw@192.168.1.50:54310/spark/checkpoint";
            ssc.sparkContext().setCheckpointDir(checkpointDir);
            StorageLevel.MEMORY_AND_DISK();
            return ssc;
        }
    };

    JavaStreamingContext ssc = JavaStreamingContext.getOrCreate(checkpointDir, scFunction);

Currently data is streaming from kafka and I am performing some transformation and action.

JavaPairDStream<Integer, Long> responseCodeCountDStream = logObject.transformToPair
            (MainApplication::responseCodeCount);
    JavaPairDStream<Integer, Long> cumulativeResponseCodeCountDStream = responseCodeCountDStream.updateStateByKey
            (COMPUTE_RUNNING_SUM);
    cumulativeResponseCodeCountDStream.foreachRDD(rdd -> {
        rdd.checkpoint();
        LOG.warn("Response code counts: " + rdd.take(100));
    });

Could somebody point me to right direction, if I am missing something?

Also, I can see that checkpoint is being saved in hdfs. But why wont it read from it?

来源:https://stackoverflow.com/questions/40535839/spark-checkpoint-doesnt-remember-state-java-hdfs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!