PySpark Streaming process failed with await termination

丶灬走出姿态 提交于 2019-12-12 01:16:17

问题


Here is the Streaming code which I run, after running for two days, it stops automatically did I miss something?

def streaming_setup():
    stream = StreamingContext(sc.sparkContext, 10)
    stream.checkpoint(config['checkpointPath'])
    lines_data = stream.textFileStream(monitor_directory)
    lines_data.foreachRDD(persist_file)
    return stream

Spark Streaming session started here,

ssc = StreamingContext.getOrCreate(config['checkpointPath'], lambda: streaming_setup())
ssc = streaming_setup()
ssc.start()
ssc.awaitTermination()

Stack traces.

INFO:py4j.java_gateway:Received command c on object id p2
ERROR:root:Exception Caught ========>An error occurred while calling o91.awaitTermination.
: java.lang.NullPointerException
    at org.apache.spark.storage.BlockManagerMaster.removeRdd(BlockManagerMaster.scala:120)
    at org.apache.spark.SparkContext.unpersistRDD(SparkContext.scala:1796)
    at org.apache.spark.rdd.RDD.unpersist(RDD.scala:216)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$clearMetadata$3.apply(DStream.scala:458)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$clearMetadata$3.apply(DStream.scala:457)
    at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:108)
    at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:108)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:108)
    at org.apache.spark.streaming.dstream.DStream.clearMetadata(DStream.scala:457)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$clearMetadata$5.apply(DStream.scala:470)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$clearMetadata$5.apply(DStream.scala:470)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.streaming.dstream.DStream.clearMetadata(DStream.scala:470)
    at org.apache.spark.streaming.DStreamGraph$$anonfun$clearMetadata$2.apply(DStreamGraph.scala:134)
    at org.apache.spark.streaming.DStreamGraph$$anonfun$clearMetadata$2.apply(DStreamGraph.scala:134)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.streaming.DStreamGraph.clearMetadata(DStreamGraph.scala:134)
    at org.apache.spark.streaming.scheduler.JobGenerator.clearMetadata(JobGenerator.scala:263)
    at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:184)
    at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:89)
    at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

来源:https://stackoverflow.com/questions/53916299/pyspark-streaming-process-failed-with-await-termination

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!