inconsistent shape error MultiLabelBinarizer on y_test, sklearn multi-label classification

旧巷老猫 提交于 2019-12-02 04:41:54

You should only call transform() on test data. Never fit() or its variations like fit_transform() or fit_predict() etc. They should be used only on training data.

So change the line:

Y_test = mlb.fit_transform(y_test)

to

Y_test = mlb.transform(y_test)

Explanation:

When you call fit() or fit_transform(), the mlb forgets its previous learnt data and learn the new supplied data. This can be problematic when Y_train and Y_test may have difference in labels as your case have.

In your case, Y_train have 49 different kinds of labels, whereas Y_test have only 42 different labels. But this doesn't mean that Y_test is 7 labels short of Y_train. It can be possible that Y_test may have entirely different set of labels, which when binarized results in 42 columns, and that will affect the results.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!