问题
Where are the dataproc spark job logs located? I know there are logs from the driver under "Logging" section but what about the execution nodes? Also, where are the detailed steps that Spark is executing logged (I know I can see them in the Application Master)? I am attempting to debug a script that seems to hang and spark seems to freeze.
回答1:
The task logs are stored on each worker node under /tmp
.
It is possible to collect them in one place via yarn log aggregation. Set these properties at cluster creation time (via --properties
with yarn:
prefix):
yarn.log-aggregation-enable=true
yarn.nodemanager.remote-app-log-dir=gs://${LOG_BUCKET}/logs
yarn.log-aggregation.retain-seconds=-1
Here's an article that discusses log management:
https://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
来源:https://stackoverflow.com/questions/47342132/where-are-the-individual-dataproc-spark-logs