问题
Regarding h2o.glm lambda search not appearing to iterate over all lambdas, I read the question as complaining that lambda was too high; they tried setting early_stopping=F
in the hope that might fix that "bug".
Isn't it the case that the original behaviour was a feature, not a bug? And if that is correct, then you should always use early_stopping=T
when using cross-validation with GLM, otherwise the error estimate from cross-validation is useless; you also risk over-fitting.
(My main question is if my understanding of the way GLM and CV work together is correct; but I'd be interested if there are any other things to watch out for when using lambda_search and cross-validation together.)
回答1:
H2O's glm with lambda search and cross-validation should always pick the best lambda based on cross-validation and use that in the returned (main) model. The early stopping option should have no effect on selected lambda. Its purpose is to skip computation of models for lambdas > best since they are not needed for the main model (we still compute models for lambdas < best since that allows to use warm starting and take full advantage of strong rules).
I think the behavior with early_stopping set to false should compute models for all lambdas in case user wants to see them / do custom model selection.
来源:https://stackoverflow.com/questions/45948642/what-do-you-need-to-watch-out-for-when-using-cross-validation-with-glm-lambda-se