Spark: Monitoring a cluster mode application

前端未结

关注

 2  1420

春和景丽 2021-01-22 17:33

Right now I\'m using spark-submit to launch an application in cluster mode. The response from the master server gives a json object with a submissionId which I use to identify

2条回答

长情又很酷 (楼主)

2021-01-22 18:14
Does the master server's response not provide application-id?

I believe all you need is the master-URL and application-id of your application for this problem. Once you have the application-id, use the port 4040 at master-URL and append your intended endpoint to it.

For example, if your application id is application_1468141556944_1055

To get the list of all jobs
```
http://:4040/api/v1/applications/application_1468141556944_1055/jobs
```
To get the list of stored RDDs
```
http://:4040/api/v1/applications/application_1468141556944_1055/storage/rdd
```
However if you don't have application-id, I would probably start with following:

Set verbose mode (--verbose) while launching spark job to get application id on console. You can then parse for application-id in log output. The log output usually looks like:
```
16/08/12 08:50:53 INFO Client: Application report for application_1468141556944_3791 (state: RUNNING)
```
thus, application-id is application_1468141556944_3791

You can also find master-url and application-id through tracking URL in the log output, which looks like
```
    client token: N/A
    diagnostics: N/A
    ApplicationMaster host: 10.50.0.33
    ApplicationMaster RPC port: 0
    queue: ns_debug
    start time: 1470992969127
    final status: UNDEFINED
    tracking URL: http://:8088/proxy/application_1468141556944_3799/
```
These messages are at INFO log level so make sure you set log4j.rootCategory=INFO, console in log4j.properties file so that you can see them.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...