Consisten results with Multiple runs of h2o deeplearning

早过忘川 提交于 2019-12-25 04:23:01

问题


For a certain combination of parameters in the deeplearning function of h2o, I get different results each time I run it.

args <- list(list(hidden = c(200,200,200), 
                  loss = "CrossEntropy",  
                  hidden_dropout_ratio = c(0.1, 0.1,0.1), 
                  activation = "RectifierWithDropout",  
                  epochs = EPOCHS))

run   <- function(extra_params) {
  model <- do.call(h2o.deeplearning, 
                   modifyList(list(x = columns, y = c("Response"),  
                   validation_frame = validation, distribution = "multinomial",
                   l1 = 1e-5,balance_classes = TRUE, 
                   training_frame = training), extra_params))
}

model <- lapply(args, run) 

What would I need to do in order to get consistent results for the model each time I run this?


回答1:


Deeplearning with H2O will not be reproducible if it is run on more than a single core. The results and performance metrics may vary slightly from what you see each time you train the deep learning model. The implementation in H2O uses a technique called "Hogwild!" which increases the speed of training at the cost of reproducibility on multiple cores.

So if you want reproducible results you will need to restrict H2O to run on a single core and make sure to use a seed in the h2o.deeplearning call.

Edit based on comment by Darren Cook: I forgot to include the reproducible = TRUE parameter that needs to be set in combination with the seed to make it truly reproducible. Note that this will make it a lot slower to run. And is is not advisable to do this with a large dataset.

More information on "Hogwild!"



来源:https://stackoverflow.com/questions/40827940/consisten-results-with-multiple-runs-of-h2o-deeplearning

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!