Spark: Monitoring a cluster mode application

前端 未结 2 1416
春和景丽
春和景丽 2021-01-22 17:33

Right now I\'m using spark-submit to launch an application in cluster mode. The response from the master server gives a json object with a submissionId which I use to identify

相关标签:
2条回答
  • 2021-01-22 18:00

    Had to scrape the spark master web ui for an application id that's close (within the same minute & same suffix e.g. 20161010025XXX-0005 with X as wildcard), then look for the worker url in the link tag after it. Not pretty, reliable, or secure, but for now it'll work. Leaving open for a bit in case someone has another approach.

    0 讨论(0)
  • 2021-01-22 18:14

    Does the master server's response not provide application-id?

    I believe all you need is the master-URL and application-id of your application for this problem. Once you have the application-id, use the port 4040 at master-URL and append your intended endpoint to it.

    For example, if your application id is application_1468141556944_1055

    To get the list of all jobs

    http://<master>:4040/api/v1/applications/application_1468141556944_1055/jobs
    

    To get the list of stored RDDs

    http://<master>:4040/api/v1/applications/application_1468141556944_1055/storage/rdd
    

    However if you don't have application-id, I would probably start with following:

    Set verbose mode (--verbose) while launching spark job to get application id on console. You can then parse for application-id in log output. The log output usually looks like:

    16/08/12 08:50:53 INFO Client: Application report for application_1468141556944_3791 (state: RUNNING)
    

    thus, application-id is application_1468141556944_3791

    You can also find master-url and application-id through tracking URL in the log output, which looks like

        client token: N/A
        diagnostics: N/A
        ApplicationMaster host: 10.50.0.33
        ApplicationMaster RPC port: 0
        queue: ns_debug
        start time: 1470992969127
        final status: UNDEFINED
        tracking URL: http://<master>:8088/proxy/application_1468141556944_3799/
    

    These messages are at INFO log level so make sure you set log4j.rootCategory=INFO, console in log4j.properties file so that you can see them.

    0 讨论(0)
提交回复
热议问题