问题
At first I thought it was a random issue, but re-running the script it happens again.
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, :
Unexpected CURL error: Recv failure: Connection reset by peer
I'm doing a grid search on a medium-size dataset (roughly 40000 x 30) with a Gradient Boosting Machine model. The largest tree in the grid is 1000. This usually happens after running for a couple of hours. I set max_mem_size
to 30Gb.
for ( k in 1:nrow(par.grid)) {
hg = h2o.gbm(training_frame = Xtr.hf,
validation_frame = Xt.hf,
distribution="huber",
huber_alpha = HuberAlpha,
x=2:ncol(Xtr.hf),
y=1,
ntrees = par.grid[k,"ntree"],
max_depth = depth,
learn_rate = par.grid[k,"shrink"],
min_rows = par.grid[k,"min_leaf"],
sample_rate = samp_rate,
col_sample_rate = c_samp_rate,
nfolds = 5,
model_id = p(iname, "_gbm_CV")
)
cv_result[k,1] = h2o.mse(hg, train=TRUE)
cv_result[k,2] = h2o.mse(hg, valid=TRUE)
}
回答1:
Try adding gc()
in your innermost loop. Even better would be to explicitly use h2o.rm()
.
So, it would become something like:
for ( k in 1:nrow(par.grid)) {
hg = h2o.gbm(...stuff...,
model_id = p(iname, "_gbm_CV")
)
cv_result[k,1] = h2o.mse(hg, train=TRUE)
cv_result[k,2] = h2o.mse(hg, valid=TRUE)
h2o.rm(hg);rm(hg);gc()
}
Theoretically this shouldn't matter, but if R holds on to the reference, then H2O will too.
If you think you might want to investigate any models further, and you have plenty of local disk space, you could do h2o.saveModel()
before your h2o.mse()
calls. (You'll need to specify a filename that somehow summarizes all your parameters, of course...)
UPDATE based on comment: If you do not need to keep any models or data, then using h2o.removeAll()
is another way to rapidly reclaim all the memory. (This approach is also worth considering if any data or models you do need preserved are quick and easy to re-load.)
来源:https://stackoverflow.com/questions/45360398/r-h2o-server-curl-error-kind-of-repeatable