Multi node cluster installation with h2o on AWS EC2

别来无恙 提交于 2020-05-29 04:16:07

问题


I was wondering about how to set up a h2o cluster using multiple AWS EC2 instances and R-Studio. I am not a computer scientist, so sorry for the trivial questions (!)

Based on this tutorial (http://amunategui.github.io/h2o-on-aws/) I sucessfully installed h2o and R-Studio on an AWS EC2 instance (Linux). But I rather want to create a multi-instance cluster with lets say 4 instance with 8 cores each.

Following this (http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/deployment/multinode.html) document, I need a flatfile.txt where I can list all IPs and ports of each EC2 instance. In a next step, I have to copy this file to each node in the cluster and afterwards I need to start a cluster via the java command line... Since I am not a computer scientist as I already mentioned, some questions emerged:

  1. Where do I find the IPs and ports of each h2o instance?
  2. How exactly can I copy the resulting file to each node?
  3. From step 5 on I am completely confused; where do I have to insert this line / where can I find the java comand line?
  4. I dont want to use the Web UI of h2o, so how can I access the cluster from R-Studio (installed on one of the instances) ?

Thank you so much in advance!


回答1:


1a. Where to get the IPs? You get told them as you create each EC2 instance. It is the private IP you want (normally starting with 172.) (BTW, make sure you create them all in the same availability zone.)

1b. Use 54321 as the port. So your flatfile.txt for 3-nodes might look like:

172.31.1.123:54321
172.31.2.237:54321
172.44.99.99:54321

_2. You might make the flatfile.txt on your notebook, then scp it to each node, in your home directory. (Use the public IP for scp.)

_3. ssh in to each machine in turn, and then type that command, from the home directory, E.g.

 java -Xmx20g -jar h2o.jar -flatfile flatfile.txt -port 54321

_4. First make sure port 8787 is open in your Amazon firewall (aka "security group"). Once you've made sure the H2O cluster is running (and assuming you have installed the H2O R package, and made sure it is exactly the same version as on each node in your cluster) then you simply do:

library(h2o)
h2o.init()

The h2o.init() looks on the local machine for any node in the cluster.


Aside:

What I have been using are the scripts found here:

https://github.com/h2oai/h2o-3/tree/master/ec2

They do almost all the steps for you, including making the flatfile, distributing it, and starting H2O on each node. You still need to set up a security group (well, optionally, I suppose: the script default is to have no security group!), and you need to set a password for the user you will use to login to RStudio with. And you need to install the H2O R package (I think that could be done from inside RStudio, if you have an aversion to the commandline).



来源:https://stackoverflow.com/questions/38351835/multi-node-cluster-installation-with-h2o-on-aws-ec2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!