Right now I\'m using spark-submit to launch an application in cluster mode. The response from the master server gives a json object with a submissionId which I use to identify
Does the master server's response not provide application-id?
I believe all you need is the master-URL and application-id of your application for this problem. Once you have the application-id, use the port 4040 at master-URL and append your intended endpoint to it.
For example, if your application id is application_1468141556944_1055
To get the list of all jobs
http://:4040/api/v1/applications/application_1468141556944_1055/jobs
To get the list of stored RDDs
http://:4040/api/v1/applications/application_1468141556944_1055/storage/rdd
However if you don't have application-id, I would probably start with following:
Set verbose
mode (--verbose) while launching spark job to get application id on console. You can then parse for application-id in log output. The log output usually looks like:
16/08/12 08:50:53 INFO Client: Application report for application_1468141556944_3791 (state: RUNNING)
thus, application-id is application_1468141556944_3791
You can also find master-url and application-id through tracking URL in the log output, which looks like
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.50.0.33
ApplicationMaster RPC port: 0
queue: ns_debug
start time: 1470992969127
final status: UNDEFINED
tracking URL: http://:8088/proxy/application_1468141556944_3799/
These messages are at INFO log level so make sure you set log4j.rootCategory=INFO, console
in log4j.properties file so that you can see them.