Which threshold does h2o.predict() use on new testing set?

。_饼干妹妹 提交于 2020-01-14 04:08:06

问题


I have read several threads on here in regards to h2o.predict() and h2o.performance() differences (as seen from link below).

How to interpret the probabilities (p0, p1) of the result of h2o.predict()

Can someone tell me which threshold does h2o.predict() use? Is it max f1? If so, is it the threshold from training data, validation data, or cross validation?

I tried to use the validation threshold using max f1 and max f0point5 on the testing set (completely separate from training and validation data) but the predicted class from h2o.predict() and the class from using the threshold doesn't match completely.

The closest one I got is to use max f0point5 threshold from training and apply it to testing set.

There is not much documentation on h2o.predict. Also, is there a best practice for threshold, i.e. mean threshold of validation and training, etc?

Thanks in advance!


回答1:


Here are the specifics of how the prediction threshold is selected when a user runs h2o.predict() or .predict():

1) if you train a model with only training data - the Max F1 threshold from the train data model metrics is used.

2) if you train a model with train and validation data - the Max F1 threshold from the validation data model metrics is used.

3) if you train a model with train data and set the nfold parameter - the Max F1 threshold from the train data model metrics is used.

4) if you train a model with the train data, validation data and set the nfold parameter - the Max F1 threshold from the validation data model metrics is used.



来源:https://stackoverflow.com/questions/53587308/which-threshold-does-h2o-predict-use-on-new-testing-set

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!