Spark doesnt print outputs on the console within the map function

只谈情不闲聊 提交于 2019-12-31 01:56:10

问题


I have a simple Spark application running on cluster mode.

val funcGSSNFilterHeader = (x: String) => {
    println(!x.contains("servedMSISDN")   
    !x.contains("servedMSISDN")
}

val ssc = new StreamingContext(sc, Seconds(batchIntervalSeconds))
val ggsnFileLines = ssc.fileStream[LongWritable, Text, TextInputFormat]("C:\\Users\\Mbazarganigilani\\Documents\\RA\\GGSN\\Files1", filterF, false)
val ggsnArrays = ggsnFileLines
    .map(x => x._2.toString()).filter(x => funcGSSNFilterHeader(x))

ggsnArrays.foreachRDD(s => {println(x.toString()})

I need to print !x.contains("servedMSISDN") inside the map function for debugging purposes, but this doesn't print on the console


回答1:


Your code contains driver (main/master) and executors (which runs on the nodes in cluster mode).

Functions which runs inside a "map" runs on the executors

i.e. when you are in cluster mode, execution print inside map function will result in print to the nodes console (which you won't see).

In order to debug a program, you can:

  1. Run the code in "local" mode, and the prints in the "map function" will be printed the console of your "master/main node" as the executors are running on the same machine

  2. Replace "print to console" with save to file / save to elastic / etc


Note that in addition to the local vs cluster mode - It seems like you have a typo in your code:

ggsnArrays.foreachRDD(s => {println(x.toString()})

Should be:

ggsnArrays.foreachRDD(s => {println(x.toString)})



回答2:


Two possibilities: Your logs are on worker nodes, so you must check worker logs for these log messages. As suggested before, you can run your application in local mode to check logs on your machine. By the way, it's better to use i.e. SLF4j than just println, but I assume it's only for learning :)

In snippet there is no ssc.start() and ssc.awaitTermination(). Did you run these commands? If not, foreachRDD will not be executed any time. If the example is ok, please add these line at the end of script and try again, but please check worker nodes logs :)



来源:https://stackoverflow.com/questions/39324082/spark-doesnt-print-outputs-on-the-console-within-the-map-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!