Apache Spark: Yarn logs Analysis

心不动则不痛 提交于 2019-12-13 02:26:06

问题


I am having a spark-streaming application, and I want to analyse the logs of the job using Elasticsearch-Kibana. My job is run on yarn cluster, so the logs are getting written to HDFS as I have set yarn.log-aggregation-enable to true. But, when I try to do this :

hadoop fs -cat ${yarn.nodemanager.remote-app-log-dir}/${user.name}/logs/<application ID>

I am seeing some encrypted/compressed data. What file format is this? How can I read the logs from this file? Can I use logstash to read this?

Also, if there is a better approach to analyse Spark logs, I am open to your suggestions.

Thanks.


回答1:


The format is called a TFile, and it is a compressed file format.

Yarn however chooses to write the application logs into a TFile!! For those of you who don’t know what a TFile is (and I bet a lot of you don’t), you can learn more about it here, but for now this basic definition should suffice “A TFile is a container of key-value pairs. Both keys and values are type-less bytes”.

Splunk / Hadoop Rant

There may be a way to edit YARN and Spark's log4j.properties to send messages to Logstash using SocketAppender

However, that method is being deprecated



来源:https://stackoverflow.com/questions/35198677/apache-spark-yarn-logs-analysis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!