问题
I have a very noisy dataset with 2000 observations and 42 features (financial data) and I'm performing binary classification. Here I'm tuning the network using h2o.grid
and providing a validation set. I've set epochs=1000
and I'm imposing to stop the training when the misclassification error does not improve by >=1% for 5 scoring events (stopping_rounds=5, stopping_tolerance=0.01
). I'm interested to know what is the value for epochs
that minimises the validation error.
hyper_params = list(rho = c(0.9,0.95,0.99),
epsilon = 10^(c(-10, -8, -6, -4)),
hidden=list(c(64, 64)),
activation=c("Tanh", "Rectifier", "RectifierWithDropout"))
grid = h2o.grid("deeplearning", x = predictors, y = response,
training_frame = tempTrain, validation_frame = tempValid,
grid_id="h2oGrid10", hyper_params = hyper_params,
adaptive_rate = TRUE, stopping_metric="misclassification",
variable_importances = TRUE, epochs = 1000,
stopping_rounds=5, stopping_tolerance=0.01, max_w2 = 20)
According to this question, the solution should be the following:
gridErr = h2o.getGrid("h2oGrid10", sort_by="err", decreasing=FALSE)
best_model = h2o.getModel(gridErr@model_ids[[1]])
solution = rev(best_model@model$scoring_history$epochs)[1]
Where solution=1000
. Anyway, checking the scoring_history
we observe the following output that is quite ambiguous.
cbind(best_model@model$scoring_history$epochs,
+ best_model@model$scoring_history$validation_classification_error)
[,1] [,2]
[1,] 0 NaN
[2,] 10 0.4971347
[3,] 160 0.4813754
[4,] 320 0.4770774
[5,] 490 0.4799427
[6,] 660 0.4727794
[7,] 840 0.4713467
[8,] 1000 0.4727794
[9,] 1000 0.4713467
In fact, it seems that the global minimum of the validation error is in correspondence of 840 epochs AND 1000 epochs. I've tried with different settings and I still get that the optimal number of epochs corresponds to the initially set number of epochs. Furthermore, I'm quite surprise to observe a so large number of optimal epochs given the conservative values for stopping_rounds=5
and stopping_tolerance=0.01
so I'm wondering whether I'm missing something important. How do I retrieve the optimal number of epochs, possibly in a finer scale (i.e. 1,2,... rather than 10,160,...)?
EDIT: The answer is in slide 8 here. What happens is that the best model is overwritten when performing the last iteration. Anyway, I've played for a while with the parameter train_samples_per_iteration
but I'm not still able to observe the evolution of the validation error with the number of epochs in a finer scale. Any idea?
来源:https://stackoverflow.com/questions/39207125/how-to-find-the-validation-error-as-a-function-of-the-number-of-epochs-on-a-fine