Spark: Monitoring a cluster mode application

前端未结

关注

 2  1419

Right now I\'m using spark-submit to launch an application in cluster mode. The response from the master server gives a json object with a submissionId which I use to identify

相关标签:

2条回答

夕颜

2021-01-22 18:00

Had to scrape the spark master web ui for an application id that's close (within the same minute & same suffix e.g. 20161010025XXX-0005 with X as wildcard), then look for the worker url in the link tag after it. Not pretty, reliable, or secure, but for now it'll work. Leaving open for a bit in case someone has another approach.

0 讨论(0)
发布评论:

提交评论
- 加载中...
长情又很酷

2021-01-22 18:14
Does the master server's response not provide application-id?

I believe all you need is the master-URL and application-id of your application for this problem. Once you have the application-id, use the port 4040 at master-URL and append your intended endpoint to it.

For example, if your application id is application_1468141556944_1055

To get the list of all jobs
```
http://<master>:4040/api/v1/applications/application_1468141556944_1055/jobs
```
To get the list of stored RDDs
```
http://<master>:4040/api/v1/applications/application_1468141556944_1055/storage/rdd
```
However if you don't have application-id, I would probably start with following:

Set verbose mode (--verbose) while launching spark job to get application id on console. You can then parse for application-id in log output. The log output usually looks like:
```
16/08/12 08:50:53 INFO Client: Application report for application_1468141556944_3791 (state: RUNNING)
```
thus, application-id is application_1468141556944_3791

You can also find master-url and application-id through tracking URL in the log output, which looks like
```
    client token: N/A
    diagnostics: N/A
    ApplicationMaster host: 10.50.0.33
    ApplicationMaster RPC port: 0
    queue: ns_debug
    start time: 1470992969127
    final status: UNDEFINED
    tracking URL: http://<master>:8088/proxy/application_1468141556944_3799/
```
These messages are at INFO log level so make sure you set log4j.rootCategory=INFO, console in log4j.properties file so that you can see them.
0 讨论(0)
发布评论:

提交评论
- 加载中...