R h2o server CURL error, kind of repeatable

狂风中的少年 提交于 2021-01-28 07:11:04

问题


At first I thought it was a random issue, but re-running the script it happens again.

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix,  : 
Unexpected CURL error: Recv failure: Connection reset by peer

I'm doing a grid search on a medium-size dataset (roughly 40000 x 30) with a Gradient Boosting Machine model. The largest tree in the grid is 1000. This usually happens after running for a couple of hours. I set max_mem_size to 30Gb.

for ( k in 1:nrow(par.grid)) {
    hg = h2o.gbm(training_frame = Xtr.hf, 
                 validation_frame = Xt.hf,
                 distribution="huber",
                 huber_alpha = HuberAlpha,
                 x=2:ncol(Xtr.hf),        
                 y=1,                     
                 ntrees = par.grid[k,"ntree"],
                 max_depth = depth,
                 learn_rate = par.grid[k,"shrink"],
                 min_rows = par.grid[k,"min_leaf"],
                 sample_rate = samp_rate,
                 col_sample_rate = c_samp_rate,
                 nfolds = 5,
                 model_id = p(iname, "_gbm_CV")
                 )
    cv_result[k,1] = h2o.mse(hg, train=TRUE)
    cv_result[k,2] = h2o.mse(hg, valid=TRUE)
  }

回答1:


Try adding gc() in your innermost loop. Even better would be to explicitly use h2o.rm().

So, it would become something like:

for ( k in 1:nrow(par.grid)) {
  hg = h2o.gbm(...stuff...,
             model_id = p(iname, "_gbm_CV")
             )
  cv_result[k,1] = h2o.mse(hg, train=TRUE)
  cv_result[k,2] = h2o.mse(hg, valid=TRUE)
  h2o.rm(hg);rm(hg);gc()
}

Theoretically this shouldn't matter, but if R holds on to the reference, then H2O will too.

If you think you might want to investigate any models further, and you have plenty of local disk space, you could do h2o.saveModel() before your h2o.mse() calls. (You'll need to specify a filename that somehow summarizes all your parameters, of course...)

UPDATE based on comment: If you do not need to keep any models or data, then using h2o.removeAll() is another way to rapidly reclaim all the memory. (This approach is also worth considering if any data or models you do need preserved are quick and easy to re-load.)



来源:https://stackoverflow.com/questions/45360398/r-h2o-server-curl-error-kind-of-repeatable

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!