Null Pointer Exception When Trying to Use Persisted Table in Spark Streaming

强颜欢笑 提交于 2020-01-03 05:46:26

问题


I am creating "gpsLookUpTable" at the beginning and persisting it so that i do not need to pull it over and over again to do mapping. However, when i try to access it inside foreach i get null pointer exception. Any help is appreciated thanks.

Below is code snippets:

def main(args: Array[String]): Unit = { 

val conf = new SparkConf() ... 

val sc = new SparkContext(conf) 
val ssc = new StreamingContext(sc, Seconds(20)) 
val sqc = new SQLContext(sc) 

//////Trying to cache table here to use it below 
val gpsLookUpTable = MapInput.cacheMappingTables(sc, sqc).persist(StorageLevel.MEMORY_AND_DISK_SER_2) 
//sc.broadcast(gpsLookUpTable) 
ssc.textFileStream("hdfs://localhost:9000/inputDirectory/") 
.foreachRDD { rdd => 
if (!rdd.partitions.isEmpty) { 

val allRows = sc.textFile("hdfs://localhost:9000/supportFiles/GeoHashLookUpTable") 
sqc.read.json(allRows).registerTempTable("GeoHashLookUpTable") 
val header = rdd.first().split(",") 
val rowsWithoutHeader = Utils.dropHeader(rdd) 

rowsWithoutHeader.foreach { row => 

val singleRowArray = row.split(",") 
singleRowArray.foreach(println) 
(header, singleRowArray).zipped 
.foreach { (x, y) => 
///Trying to access persisted table but getting null pointer exception 
val selectedRow = gpsLookUpTable 
.filter("geoCode LIKE '" + GeoHash.subString(lattitude, longitude) + "%'") 
.withColumn("Distance", calculateDistance(col("Lat"), col("Lon"))) 
.orderBy("Distance") 
.select("TrackKM", "TrackName").take(1) 
if (selectedRow.length != 0) { 
// do something
} 
else { 
// do something
} 
} 
} }}

回答1:


I assume you are running in a cluster; your foreach will run as a closure on other nodes. The Nullpointer is raised because that closure runs on a node which doesn't have a initialized gpsLookUpTable. You did obviously try to broadcast gpsLookUpTable in

//sc.broadcast(gpsLookUpTable) 

But this need to be bound to a variable, basically like this:

val tableBC = sc.broadcast(gpsLookUpTable) 

in foreach, you would replace this:

foreach { (x, y) => 
///Trying to access persisted table but getting null pointer exception 
val selectedRow = gpsLookUpTable 

with this:

foreach { (x, y) => 
///Trying to access persisted table but getting null pointer exception 
val selectedRow = tableBC.value 

which effectively give you access to the broadcast value.



来源:https://stackoverflow.com/questions/32458109/null-pointer-exception-when-trying-to-use-persisted-table-in-spark-streaming

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!