Testing text classification ML model with new data fails

前端 未结 1 1085
终归单人心
终归单人心 2020-12-22 13:46

I have built a machine learning model to classify emails as spams or not. Now i want to test my own email and see the result. So i wrote the following code to classify the n

相关标签:
1条回答
  • 2020-12-22 13:59

    For all data preprocessing steps in such pipelines, we never fit again, as you do here with your (newly defined) count vectorizer.

    So, instead of using fit_transform with a new count vectorizer, you should reuse the existing count vectorizer (i.e. the one used with your training data), by applying its transform method. That will allow your new data to be mapped in relation to the 37229 features of the training data (with which the model was trained), instead of the only 13 features produced when you fit again a count vectorizer to such a short text.

    0 讨论(0)
提交回复
热议问题