问题
I am trying to transform an RDD in a dstream by adding changing the log with the maximum timestamp, and adding a duplicate copy of it with some modifications.
Please note I am using ssc.checkpoint()
and the error seems to go away if I comment it out.
The following is the example code:
JavaDStream<LogMessage> logMessageWithHB = logMessageMatched.transform(new Function<JavaRDD<LogMessage>, JavaRDD<LogMessage>>() {
@Override
public JavaRDD<LogMessage> call(JavaRDD<LogMessage> logMessageJavaRDD) throws Exception {
if(!logMessageJavaRDD.isEmpty()) {
LogMessage max = logMessageJavaRDD.max(ComparatorLogMessage.class.newInstance());
List<LogMessage> tempList = new ArrayList<LogMessage>();
max.convertToHBLogMessage();
tempList.add(max);
JavaRDD<LogMessage> parallelize = ssc.sparkContext().parallelize(tempList);
JavaRDD<LogMessage> union = logMessageJavaRDD.union(parallelize);
return union;
}else{
return logMessageJavaRDD;
}
}
});
Please note the following is my Spark Driver Code, the object is created in a different class as explained in question: java.io.NotSerializableException in Spark Streaming with enabled checkpointing
public class SparkDriver implements Serializable {
public static boolean debugFlag = true;
private SparkConf sparkConf;
private JavaStreamingContext ssc;
public void initializeSpark(ParseConfiguration config){
sparkConf = new SparkConf();
sparkConf.setMaster("local").setAppName("SPARKNGLA");
ssc = new JavaStreamingContext(sparkConf, new Duration(duration*1000));
ssc.checkpoint("checkpoint");
}
}
The main function which executes this class is as follows:
public static void main(String args[]) throws Exception{
SparkDriver snObj = new SparkDriver();
ParseConfiguration config = new ParseConfiguration();
config.parseConfiguration(CONFIGURATION_FILE);
SparkDriver.configObj = config;
snObj.initializeSpark(config);
Type9ViolationChecker violationChecker = new Type9ViolationChecker();
violationChecker.initiateBroadCastVariables(snObj.getSsc().sparkContext());
violationChecker.readModel(cl,cm);
violationChecker.broadcastVariables(snObj.getSsc().sparkContext());
JavaDStream<String> lines = KafkaUtil.getKafkaDStream(SparkDriver.configObj.getParsedLogsChannel(),SparkDriver.configObj.getKafkaServer(),SparkDriver.configObj.getKafkaPort(),snObj.getSsc());
//Execute All Spark Operations Here *** THIS IS WHERE THE ERROR HAPPENS ****
snObj.getSsc().start();
snObj.getSsc().awaitTermination();
}
The following is the error:
java.io.NotSerializableException: DStream checkpointing has been enabled but the DStreams with their functions are not serializable
org.apache.spark.streaming.api.java.JavaStreamingContext
Serialization stack:
- object not serializable (class: org.apache.spark.streaming.api.java.JavaStreamingContext, value: org.apache.spark.streaming.api.java.JavaStreamingContext@2ed40452)
- field (class: org.necla.ngla.loganalyzer.stateful.Type9.Type9ViolationChecker$6, name: val$ssc, type: class org.apache.spark.streaming.api.java.JavaStreamingContext)
- object (class org.necla.ngla.loganalyzer.stateful.Type9.Type9ViolationChecker$6, org.necla.ngla.loganalyzer.stateful.Type9.Type9ViolationChecker$6@4f9f32a6)
- field (class: org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$transform$1, name: transformFunc$1, type: interface org.apache.spark.api.java.function.Function)
- object (class org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$transform$1, <function1>)
- field (class: org.apache.spark.streaming.dstream.DStream$$anonfun$transform$1$$anonfun$apply$21, name: cleanedF$2, type: interface scala.Function1)
- object (class org.apache.spark.streaming.dstream.DStream$$anonfun$transform$1$$anonfun$apply$21, <function2>)
- field (class: org.apache.spark.streaming.dstream.DStream$$anonfun$transform$2$$anonfun$5, name: cleanedF$3, type: interface scala.Function2)
- object (class org.apache.spark.streaming.dstream.DStream$$anonfun$transform$2$$anonfun$5, <function2>)
- field (class: org.apache.spark.streaming.dstream.TransformedDStream, name: transformFunc, type: interface scala.Function2)
- object (class org.apache.spark.streaming.dstream.TransformedDStream, org.apache.spark.streaming.dstream.TransformedDStream@39983b43)
- writeObject data (class: org.apache.spark.streaming.dstream.DStream)
- object (class org.apache.spark.streaming.dstream.ForEachDStream, org.apache.spark.streaming.dstream.ForEachDStream@4645de69)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 16)
- field (class: scala.collection.mutable.ArrayBuffer, name: array, type: class [Ljava.lang.Object;)
- object (class scala.collection.mutable.ArrayBuffer, ArrayBuffer(org.apache.spark.streaming.dstream.ForEachDStream@4645de69, org.apache.spark.streaming.dstream.ForEachDStream@6aeebb54, org.apache.spark.streaming.dstream.ForEachDStream@5f4cbea4, org.apache.spark.streaming.dstream.ForEachDStream@277831dd, org.apache.spark.streaming.dstream.ForEachDStream@4411b869))
- writeObject data (class: org.apache.spark.streaming.dstream.DStreamCheckpointData)
- object (class org.apache.spark.streaming.dstream.DStreamCheckpointData, [
0 checkpoint files
])
- writeObject data (class: org.apache.spark.streaming.dstream.DStream)
- object (class org.apache.spark.streaming.kafka.DirectKafkaInputDStream, org.apache.spark.streaming.kafka.DirectKafkaInputDStream@28b5d9b8)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 16)
- field (class: scala.collection.mutable.ArrayBuffer, name: array, type: class [Ljava.lang.Object;)
- object (class scala.collection.mutable.ArrayBuffer, ArrayBuffer(org.apache.spark.streaming.kafka.DirectKafkaInputDStream@28b5d9b8))
- writeObject data (class: org.apache.spark.streaming.DStreamGraph)
- object (class org.apache.spark.streaming.DStreamGraph, org.apache.spark.streaming.DStreamGraph@6a8a4078)
- field (class: org.apache.spark.streaming.Checkpoint, name: graph, type: class org.apache.spark.streaming.DStreamGraph)
- object (class org.apache.spark.streaming.Checkpoint, org.apache.spark.streaming.Checkpoint@4e49e681)
at org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:574)
at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:618)
at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:617)
at org.apache.spark.streaming.api.java.JavaStreamingContext.start(JavaStreamingContext.scala:624)
at org.necla.ngla.loganalyzer.stateful.Type9.Type9ViolationCheckerTest.execute(Type9ViolationCheckerTest.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at com.intellij.junit4.JUnit4TestRunnerUtil$IgnoreIgnoredTestJUnit4ClassRunner.runChild(JUnit4TestRunnerUtil.java:365)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Any help will be greatly appreciated.
来源:https://stackoverflow.com/questions/41747725/sparkstreaming-creating-rdd-and-doing-union-in-a-transform-operation-with-ssc-ch