问题
I want to tune a neural network with dropout using h2o in R. Here I provide a reproducible example for the iris dataset. I'm avoiding to tune eta
and epsiplon
(i.e. ADADELTA hyper-parameters) with the only purpose of making computations faster.
require(h2o)
h2o.init()
data(iris)
iris = iris[sample(1:nrow(iris)), ]
irisTrain = as.h2o(iris[1:90, ])
irisValid = as.h2o(iris[91:120, ])
irisTest = as.h2o(iris[121:150, ])
hyper_params <- list(
input_dropout_ratio = list(0, 0.15, 0.3),
hidden_dropout_ratios = list(0, 0.15, 0.3, c(0,0), c(0.15,0.15),c(0.3,0.3)),
hidden = list(64, c(32,32)))
grid = h2o.grid("deeplearning", x=colnames(iris)[1:4], y=colnames(iris)[5],
training_frame = irisTrain, validation_frame = irisValid,
hyper_params = hyper_params, adaptive_rate = TRUE,
variable_importances = TRUE, epochs = 50, stopping_rounds=5,
stopping_tolerance=0.01, activation=c("RectifierWithDropout"),
seed=1, reproducible=TRUE)
The output is:
Details: ERRR on field: _hidden_dropout_ratios: Must have 1 hidden layer dropout ratios.
The problem is in hidden_dropout_ratios
. Note that I'm including 0 for input_dropout_ratio and hidden_dropout_ratios since I also want to test the activation function without dropout. I'm aware that I could use activation="Rectifier
but I think that my configuration should lead to the same result. How do I tune hidden_dropout_ratios
when tuning architectures with different numbers of layers?
Attempt 1: Unsuccessful and I'm not tuning hidden
.
hyper_params <- list(
input_dropout_ratio = c(0, 0.15, 0.3),
hidden_dropout_ratios = list(c(0.3,0.3), c(0.5,0.5)),
hidden = c(32,32))
ERRR on field: _hidden_dropout_ratios: Must have 1 hidden layer dropout ratios.
Attempt 2: Successful but I'm not tuning hidden
.
hyper_params <- list(
input_dropout_ratio = c(0, 0.15, 0.3),
hidden_dropout_ratios = c(0.3,0.3),
hidden = c(32,32))
回答1:
You have to fix the number of hidden layers in a grid, if experimenting with hidden_dropout_ratios. At first I messed around with combining multiple grids; then, when researching for my H2O book, I saw someone mention, in passing, how grids get combined automatically if you give them the same name.
So, you still need to call h2o.grid()
for each number of hidden layers, but they can all be in the same grid at the end. Here is your example modified for that:
require(h2o)
h2o.init()
data(iris)
iris = iris[sample(1:nrow(iris)), ]
irisTrain = as.h2o(iris[1:90, ])
irisValid = as.h2o(iris[91:120, ])
irisTest = as.h2o(iris[121:150, ])
hyper_params1 <- list(
input_dropout_ratio = c(0, 0.15, 0.3),
hidden_dropout_ratios = list(0, 0.15, 0.3),
hidden = list(64)
)
hyper_params2 <- list(
input_dropout_ratio = c(0, 0.15, 0.3),
hidden_dropout_ratios = list(c(0,0), c(0.15,0.15),c(0.3,0.3)),
hidden = list(c(32,32))
)
grid = h2o.grid("deeplearning", x=colnames(iris)[1:4], y=colnames(iris)[5],
grid_id = "stackoverflow",
training_frame = irisTrain, validation_frame = irisValid,
hyper_params = hyper_params1, adaptive_rate = TRUE,
variable_importances = TRUE, epochs = 50, stopping_rounds=5,
stopping_tolerance=0.01, activation=c("RectifierWithDropout"),
seed=1, reproducible=TRUE)
grid = h2o.grid("deeplearning", x=colnames(iris)[1:4], y=colnames(iris)[5],
grid_id = "stackoverflow",
training_frame = irisTrain, validation_frame = irisValid,
hyper_params = hyper_params2, adaptive_rate = TRUE,
variable_importances = TRUE, epochs = 50, stopping_rounds=5,
stopping_tolerance=0.01, activation=c("RectifierWithDropout"),
seed=1, reproducible=TRUE)
When I went to print the grid, I was reminded there is a bug with grid output when using list hyper-parameters, such as hidden or hidden_dropout_ratios. Your code is a nice self-contained example, so I'll report that now. In the meantime, here is a one-liner to show the values of the hyper-parameter corresponding to each:
sapply(models, function(m) c(
paste(m@parameters$hidden, collapse = ","),
paste(m@parameters$hidden_dropout_ratios, collapse=",")
))
Which gives:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "32,32" "64" "32,32" "64" "32,32" "64"
[2,] "0,0" "0" "0.15,0.15" "0.15" "0.3,0.3" "0.3"
I.e. no hidden dropout is better than a little, which is better than a lot. And two hidden layers is better than one.
By the way,
input_dropout_ratio
: controls dropout between input layer and the first hidden layer. Can be used independently of the activation function.hidden_dropout_ratios
: controls dropout between each hidden layer and the next layer (which is either the next hidden layer, or the output layer). If specified, you must specify one of the "WithDropout" activation functions.
来源:https://stackoverflow.com/questions/39212635/how-to-tune-hidden-dropout-ratios-in-h2o-grid-in-r