inconsistent shape error MultiLabelBinarizer on y_test, sklearn multi-label classification

后端 未结 1 1763
北海茫月
北海茫月 2021-01-25 12:18
import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSV         


        
相关标签:
1条回答
  • 2021-01-25 12:54

    You should only call transform() on test data. Never fit() or its variations like fit_transform() or fit_predict() etc. They should be used only on training data.

    So change the line:

    Y_test = mlb.fit_transform(y_test)

    to

    Y_test = mlb.transform(y_test)

    Explanation:

    When you call fit() or fit_transform(), the mlb forgets its previous learnt data and learn the new supplied data. This can be problematic when Y_train and Y_test may have difference in labels as your case have.

    In your case, Y_train have 49 different kinds of labels, whereas Y_test have only 42 different labels. But this doesn't mean that Y_test is 7 labels short of Y_train. It can be possible that Y_test may have entirely different set of labels, which when binarized results in 42 columns, and that will affect the results.

    0 讨论(0)
提交回复
热议问题