In Dataproc how can I access the Spark and Hadoop job history?

问题

In Google Cloud Dataproc how can I access the Spark or Hadoop job history servers? I want to be able to look at my job history details when I run jobs.

回答1:

To do this, you will need to create an SSH tunnel to the cluster and then use a SOCKS proxy with your browser. This is due to the fact that while the web interfaces are open on the cluster, firewall rules prevent anyone from connecting (for security.)

To access the Spark or Hadoop job history server, you will first need to create an SSH tunnel to the master node of your cluster:

gcloud compute ssh --zone=<master-host-zone> \ --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n" <master-host-name>

Once you have the SSH tunnel in place, you need to configure a browser to use a SOCKS proxy. Assuming you're using Chrome and know the path to Chrome on your system, you can launch Chrome with a SOCKS proxy using:

<Google Chrome executable path> \
  --proxy-server="socks5://localhost:1080" \
  --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" \
  --user-data-dir=/tmp/

The full details on how to do this can be found here.

来源：https://stackoverflow.com/questions/33836067/in-dataproc-how-can-i-access-the-spark-and-hadoop-job-history

标签

google-cloud-dataproc

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!