How to view the logs of a spark job after it has completed and the context is closed?

旧街凉风 提交于 2019-12-06 02:47:53

问题


I am running pyspark, spark 1.3, standalone mode, client mode.

I am trying to investigate my spark job by looking at the jobs from the past and comparing them. I want to view their logs, the configuration settings under which the jobs were submitted, etc. But I'm running into trouble viewing the logs of jobs after the context is closed.

When I submit a job, of course I open a spark context. While the job is running, I'm able to open the spark web UI using ssh tunneling. And, I can access the forwarded port by localhost:<port no>. Then I can view the jobs currently running, and the ones that are completed, like this:

Then, if I wish to see the logs of a particular job, I can do so by using ssh tunnel port forwarding to see the logs on a particular port for a particular machine for that job.

Then, sometimes the job fails, but the context is still open. When this happens, I am still able to see the logs by the above method.

But, since I don't want to have all of these contexts open at once, when the job fails, I close the context. When I close the context, the job appears under "Completed Applications" in the image above. Now, when I try to view the logs by using ssh tunnel port forwarding, as before (localhost:<port no>), it gives me a page not found.

How do I view the logs of a job after the context is closed? And, what does this imply about the relationship between the spark context and where the logs are kept? Thank you.

Again, I am running pyspark, spark 1.3, standalone mode, client mode.


回答1:


Spark event log / history-server is for this use case.

Enable event log

If conf/spark-default.conf does not exist

cp conf/spark-defaults.conf.template conf/spark-defaults.conf

add the following configuration to conf/spark-default.conf.

# This is to enabled event log
spark.eventLog.enabled  true

// this is where to store event log
spark.eventLog.dir file:///Users/rockieyang/git/spark/spark-events

// this is tell history server where to get event log
spark.history.fs.logDirectory file:///Users/rockieyang/git/spark/spark-events

History server

start history server

sbin/start-history-server.sh 

check history, by default the port is 18080

http://localhost:18080/



来源:https://stackoverflow.com/questions/38405461/how-to-view-the-logs-of-a-spark-job-after-it-has-completed-and-the-context-is-cl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!