Spark UI on AWS EMR

前端 未结 5 1420
清歌不尽
清歌不尽 2021-02-07 10:41

I am running a AWS EMR cluster with Spark (1.3.1) installed via the EMR console dropdown. Spark is current and processing data but I am trying to find which port has been assign

相关标签:
5条回答
  • 2021-02-07 11:22

    Simply use SSH tunnel On your local machine do:

    ssh -i /path/to/pem -L 3000:ec2-xxxxcompute-1.amazonaws.com:8088 hadoop@ec2-xxxxcompute-1.amazonaws.com

    On your local machine browser hit:

    localhost:3000

    0 讨论(0)
  • 2021-02-07 11:31

    Spark on EMR is configured for YARN, thus the Spark UI is available by the application url provided by the YARN Resource Manager (http://spark.apache.org/docs/latest/monitoring.html). So the easiest way to get to it is to setup your browser with SOCKS using a port opened by SSH then from the EMR console open Resource Manager and click the Application Master URL provided to the right of the running application. Spark History server is available at the default port 18080.

    Example of socks with EMR at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-web-interfaces.html

    0 讨论(0)
  • 2021-02-07 11:32

    Here is an alternative if you don't want to deal with the browser setup with SOCKS as suggested on the EMR docs.

    1. Open a ssh tunnel to the master node with port forwarding to the machine running spark ui

      ssh -i path/to/aws.pem  -L 4040:SPARK_UI_NODE_URL:4040 hadoop@MASTER_URL
      

      MASTER_URL (EMR_DNS in the question) is the URL of the master node that you can get from EMR Management Console page for the cluster

      SPARK_UI_NODE_URL can be seen near the top of the stderr log. The log line will look something like:

      16/04/28 21:24:46 INFO SparkUI: Started SparkUI at http://10.2.5.197:4040
      
    2. Point your browser to localhost:4040

    Tried this on EMR 4.6 running Spark 2.6.1

    0 讨论(0)
  • 2021-02-07 11:39

    Glad to announce that this feature is finally available on AWS. You won't need to run any special commands (or to configure a SSH tunnel) :

    By clicking on the link to the spark history server ui, you'll be able to see the old applications logs, or to access the running spark job's ui :

    For more details: https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html

    I hope it helps !

    0 讨论(0)
  • 2021-02-07 11:45

    Just run the following command:

    ssh -i /your-path/aws.pem -N -L 20888:ip-172-31-42-70.your-region.compute.internal:20888 hadoop@ec2-xxx.compute.amazonaws.com.cn
    

    There are 3 places you need to change:

    1. your .pem file
    2. your internal master node IP
    3. your public DNS domain.

    Finally, on the Yarn UI you can click your Spark Application Tracking URL, then just replace the url:

    "http://your-internal-ip:20888/proxy/application_1558059200084_0002/" 
    
    ->
    
    "http://localhost:20888/proxy/application_1558059200084_0002/"
    

    It worked for EMR 5.x

    0 讨论(0)
提交回复
热议问题