Monitoring Apache Spark with Prometheus

浪子不回头ぞ 提交于 2020-02-18 08:09:16

问题


I have read that Spark does not have Prometheus as one of the pre-packaged sinks. So I found this post on how to monitor Apache Spark with prometheus.

But I found it difficult to understand and to success because I am beginner and this is a first time to work with Apache Spark.

First thing that I do not get is what I need to do?

  • I need to change the metrics.properties

  • Should I add some code in the app or?

I do not get what are the steps to make it...

The thing that I am making is: changing the properties like in the link, write this command:

--conf spark.metrics.conf=<path_to_the_file>/metrics.properties

And what else I need to do to see metrics from Apache spark?

Also I found this links: Monitoring Apache Spark with Prometheus

https://argus-sec.com/monitoring-spark-prometheus/

But I could not make it with it too...

I have read that there is a way to get metrics from Graphite and then to export them to Prometheus but I could not found some useful doc.


回答1:


There are few ways to monitoring Apache Spark with Prometheus.

One of the way is by JmxSink + jmx-exporter

Preparations

  • Uncomment *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink in spark/conf/metrics.properties
  • Download jmx-exporter by following link on prometheus/jmx_exporter
  • Download Example prometheus config file

Use it in spark-shell or spark-submit

In the following command, the jmx_prometheus_javaagent-0.3.1.jar file and the spark.yml are downloaded in previous steps. It might need be changed accordingly.

bin/spark-shell --conf "spark.driver.extraJavaOptions=-javaagent:jmx_prometheus_javaagent-0.3.1.jar=8080:spark.yml" 

Access it

After running, we can access with localhost:8080/metrics

Next

It can then configure prometheus to scrape the metrics from jmx-exporter.

NOTE: We have to handle to discovery part properly if it's running in a cluster environment.




回答2:


I have followed the GitHub readme and it worked for me (the original blog assumes that you use the Banzai Cloud fork as they were expected the PR to accepted upstream). They externalized the sink to a standalone project (https://github.com/banzaicloud/spark-metrics) and I used that to make it work with Spark 2.3.

Actually you can scrape (Prometheus) through JMX, and in that case you don't need the sink - the Banzai Cloud folks did a post about how they use JMX for Kafka, but actually you can do this for any JVM.

So basically you have two options:

  • use the sink

  • or go through JMX,

they open sourced both options.



来源:https://stackoverflow.com/questions/49488956/monitoring-apache-spark-with-prometheus

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!