问题
I have already read this question: How should we interpret the results of the H2O predict function? Still don't understand if p1 is the probability between [0,1] and could be used equally as it 's a regression and i can apply my own threshold
edit: thank you for your answer still have some confusion about it, let's dig it suppose my outcome Y is [0,1], if Y is numeric i run it as REGRESSION and i have a single column as response. On the other hand if Y is factor run it as CLASSIFICATION and the output is: prediction/p0/p1. NOW, is p1 the same as use Y as numeric? Also http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/calibrate_model.html calibrate_model parameter affects logloss but now the max F1 is still used as threshold on P0 P1 or on the calibrated probabilities? Can i use the calibrated probabilities for regression as the logloss is supposed less?
回答1:
the output of a binary classification problem for H2O will give you the class label (where the threshold is set to get you the max F1 score), the predicted value of class 0 (p0), and the predicted value of class 1 (p1).
These predicted values are uncalibrated probabilities, if you want actual probabilities you need to set H2O's model argument calibrate_model
to True.
So to answer your question, yes p1
is the predicted value between 0 and 1 (for example you will see values like .23, .45. , .89, etc.) and because H2O builds regression trees you could technically use 1-p0
to get your p1
value (or vice versa) and in fact unless you set binomial_double_trees = True
this is exactly what H2O is doing: it builds a single regression tree for one of the classes and then takes 1-(that class value) to get the predicted values for the other class.
来源:https://stackoverflow.com/questions/48925902/h2o-binary-classification-understand-p0-and-p1