How can I increase the h2o startup timeout when starting an h2o server via R?
I have a multinode AWS EC2 cluster, where I start a separate h2o server on each node. After startup, some EC2 nodes can be a bit slow and I'd rather increase the timeout than to re-run the h2o
initialization code on these nodes.
What I am currently doing is along the lines of
library(doParallel)
library(foreach)
workers=parallel::makePSOCKcluster(workerIPs,master=masterIP)
registerDoParallel(workers)
foreach(i=seq_along(workers),.inorder=FALSE,.multicombine=TRUE) %dopar% {
library(h2o)
h2o.init(nthreads=-1)
paste0(capture.output(h2o.clusterStatus()),collapse="\n")
}
Slow nodes will throw an error at h2o.clusterStatus()
if h2o.init(nthreads=-1)
produced a timeout.
BTW: I am using h2o v 3.10.4.4 and I am on ubuntu 16.04.
So, I looked at the h2o
source code on github and it does not seem as if there is a timeout
argument (neither in R
nor in the underlying java
code). There is a java
argument called session_timeout
but I don't think this applies to my problem.
So what I did is this:
foreach(i=seq_along(workers),.inorder=FALSE,.multicombine=TRUE) %dopar% {
library(h2o)
startCounter=1
startCounterMax=3
while(inherits(clusterStatus<-try({
h2o.init(nthreads=-1)
capture.output(h2o.clusterStatus())
},silent=TRUE),"try-error")&(startCounter<=startCounterMax)) {
startCounter=startCounter+1
}
if (startCounter>startCounterMax) stop("Failed to start h2o server for ",
startCounterMax," successive times")
return(clusterStatus)
}
Not very nice but it does the job.
If you are trying to form a cluster of several H2O nodes (say cluster of 3 h2o nodes with one node per machine) and you want to wait for a specified time then you can try it in Java code - water.H2O.waitForCloudSize(3, 50 * 1000/*ms*/);
I assume there should be the corresponding parameter available in R as well.
来源:https://stackoverflow.com/questions/43515062/increase-h2o-init-timeout