I\'ve saved my classifier pipeline using joblib:
vec = TfidfVectorizer(sublinear_tf=True, max_df=0.5, ngram_range=(1, 3))
pac_clf = PassiveAggressiveClassi
Just replace:
#load classifier and predict
classifier = joblib.load('class.pkl')
#vectorize/transform the new title then predict
vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5, ngram_range=(1, 3))
X_test = vectorizer.transform(title)
predict = classifier.predict(X_test)
return predict
by:
# load the saved pipeline that includes both the vectorizer
# and the classifier and predict
classifier = joblib.load('class.pkl')
predict = classifier.predict(X_test)
return predict
class.pkl
includes the full pipeline, there is no need to create a new vectorizer instance. As the error message says you need to reuse the vectorizer that was trained in the first place because the feature mapping from token (string ngrams) to column index is saved in the vectorizer itself. This mapping is named the "vocabulary".