Bringing a classifier to production

前端 未结 1 1721
梦谈多话
梦谈多话 2021-01-05 11:25

I\'ve saved my classifier pipeline using joblib:

vec = TfidfVectorizer(sublinear_tf=True, max_df=0.5, ngram_range=(1, 3))
pac_clf = PassiveAggressiveClassi         


        
相关标签:
1条回答
  • 2021-01-05 11:47

    Just replace:

      #load classifier and predict
      classifier = joblib.load('class.pkl')
    
      #vectorize/transform the new title then predict
      vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5, ngram_range=(1, 3))
      X_test = vectorizer.transform(title)
      predict = classifier.predict(X_test)
      return predict
    

    by:

      # load the saved pipeline that includes both the vectorizer
      # and the classifier and predict
      classifier = joblib.load('class.pkl')
      predict = classifier.predict(X_test)
      return predict
    

    class.pkl includes the full pipeline, there is no need to create a new vectorizer instance. As the error message says you need to reuse the vectorizer that was trained in the first place because the feature mapping from token (string ngrams) to column index is saved in the vectorizer itself. This mapping is named the "vocabulary".

    0 讨论(0)
提交回复
热议问题