问题
I have write down a simple code that takes One arguments "query_seq", further methods calculates descriptor and in the end predictions can be made using "LogisticRegression" (or any other algorithm provided with the function) algorithms as "0 (negative for given case)" or "1 (positive for given case)"
def main_process(query_Seq):
LR = LogisticRegression()
GNB = GaussianNB()
KNB = KNeighborsClassifier()
DT = DecisionTreeClassifier()
SV = SVC(probability=True)
train_x, train_y,train_l = data_gen(p)
a = DC_CLASS()
test_x = a.main_p(query_Seq)
return Prediction(train_x, train_y, test_x,LR)
While we performed cross validation we have calculated the different statistical parameters for the accuracy estimation (specificity, sensitivity, mmc, etc. ) for an algorithm. Now my Question is that, is there any method in scikit-learn through which we can estimate the confidence score for a test data prediction.
回答1:
Many classifiers can give you a hint of their own confidence level for a given prediction by calling the predict_proba
instead of the predict
method. Read the docstring of this method to understand the content of the numpy array it returns.
Note however that classifiers can also make mistakes in estimating their own confidence level. To fix this you can use an external calibration procedure to calibrate the classifier via held out data (using a cross-validation loop). The documentation will give you more details on calibration:
http://scikit-learn.org/stable/modules/calibration.html
Finally note that LogisticRegression
gives reasonably well calibrated confidence levels by default. Most other model class to benefit from external calibration.
来源:https://stackoverflow.com/questions/36643344/how-to-assess-the-confidence-score-of-a-prediction-with-scikit-learn