问题
I am running pyspark
, spark 1.3
, standalone mode
, client mode
.
I am trying to investigate my spark job by looking at the jobs from the past and comparing them. I want to view their logs, the configuration settings under which the jobs were submitted, etc. But I'm running into trouble viewing the logs of jobs after the context is closed.
When I submit a job, of course I open a spark context. While the job is running, I'm able to open the spark web UI using ssh tunneling. And, I can access the forwarded port by localhost:<port no>
. Then I can view the jobs currently running, and the ones that are completed, like this:
Then, if I wish to see the logs of a particular job, I can do so by using ssh tunnel port forwarding to see the logs on a particular port for a particular machine for that job.
Then, sometimes the job fails, but the context is still open. When this happens, I am still able to see the logs by the above method.
But, since I don't want to have all of these contexts open at once, when the job fails, I close the context. When I close the context, the job appears under "Completed Applications" in the image above. Now, when I try to view the logs by using ssh tunnel port forwarding, as before (localhost:<port no>
), it gives me a page not found
.
How do I view the logs of a job after the context is closed? And, what does this imply about the relationship between the spark context
and where the logs are kept? Thank you.
Again, I am running pyspark
, spark 1.3
, standalone mode
, client mode
.
回答1:
Spark event log / history-server is for this use case.
Enable event log
If conf/spark-default.conf
does not exist
cp conf/spark-defaults.conf.template conf/spark-defaults.conf
add the following configuration to conf/spark-default.conf
.
# This is to enabled event log
spark.eventLog.enabled true
// this is where to store event log
spark.eventLog.dir file:///Users/rockieyang/git/spark/spark-events
// this is tell history server where to get event log
spark.history.fs.logDirectory file:///Users/rockieyang/git/spark/spark-events
History server
start history server
sbin/start-history-server.sh
check history, by default the port is 18080
http://localhost:18080/
来源:https://stackoverflow.com/questions/38405461/how-to-view-the-logs-of-a-spark-job-after-it-has-completed-and-the-context-is-cl