Sklearn - How to predict probability for all target labels

核能气质少年 提交于 2019-12-19 08:13:21

问题


I have a data set with a target variable that can have 7 different labels. Each sample in my training set has only one label for the target variable.

For each sample, I want to calculate the probability for each of the target labels. So my prediction would consist of 7 probabilities for each row.

On the sklearn website I read about multi-label classification, but this doesn't seem to be what I want.

I tried the following code, but this only gives me one classification per sample.

from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(DecisionTreeClassifier())
clf.fit(X_train, y_train)
pred = clf.predict(X_test)

Does anyone have some advice on this? Thanks!


回答1:


You can do that by simply removing the OneVsRestClassifer and using predict_proba method of the DecisionTreeClassifier. You can do the following:

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
pred = clf.predict_proba(X_test)

This will give you a probability for each of your 7 possible classes.

Hope that helps!




回答2:


You can try using scikit-multilearn - an extension of sklearn that handles multilabel classification. If your labels are not overly correlated you can train one classifier per label and get all predictions - try (after pip install scikit-multilearn):

from skmultilearn.problem_transform import BinaryRelevance    
classifier = BinaryRelevance(classifier = DecisionTreeClassifier())

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

Predictions will contain a sparse matrix of size (n_samples, n_labels) in your case - n_labels = 7, each column contains prediction per label for all samples.

In case your labels are correlated you might need more sophisticated methods for multi-label classification.

Disclaimer: I'm the author of scikit-multilearn, feel free to ask more questions.



来源:https://stackoverflow.com/questions/38403951/sklearn-how-to-predict-probability-for-all-target-labels

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!