问题
I have read several threads on here in regards to h2o.predict() and h2o.performance() differences (as seen from link below).
How to interpret the probabilities (p0, p1) of the result of h2o.predict()
Can someone tell me which threshold does h2o.predict() use? Is it max f1
? If so, is it the threshold from training data, validation data, or cross validation?
I tried to use the validation threshold using max f1
and max f0point5
on the testing set (completely separate from training and validation data) but the predicted class from h2o.predict() and the class from using the threshold doesn't match completely.
The closest one I got is to use max f0point5
threshold from training and apply it to testing set.
There is not much documentation on h2o.predict. Also, is there a best practice for threshold, i.e. mean threshold of validation and training, etc?
Thanks in advance!
回答1:
Here are the specifics of how the prediction threshold is selected when a user runs h2o.predict()
or .predict()
:
1) if you train a model with only training data - the Max F1 threshold from the train data model metrics is used.
2) if you train a model with train and validation data - the Max F1 threshold from the validation data model metrics is used.
3) if you train a model with train data and set the nfold parameter - the Max F1 threshold from the train data model metrics is used.
4) if you train a model with the train data, validation data and set the nfold parameter - the Max F1 threshold from the validation data model metrics is used.
来源:https://stackoverflow.com/questions/53587308/which-threshold-does-h2o-predict-use-on-new-testing-set