Dealing with negative values in sklearn MultinomialNB

后端 未结 1 357
别跟我提以往
别跟我提以往 2021-01-02 01:02

I am normalizing my text input before running MultinomialNB in sklearn like this:

vectorizer = TfidfVectorizer(max_df=0.5, stop_words=\'english\', use_idf=Tr         


        
相关标签:
1条回答
  • 2021-01-02 01:06

    I recommend you that don't use Naive Bayes with SVD or other matrix factorization because Naive Bayes based on applying Bayes' theorem with strong (naive) independence assumptions between the features. Use other classifier, for example RandomForest

    I tried this experiment with this results:

    vectorizer = TfidfVectorizer(max_df=0.5, stop_words='english', use_idf=True)
    lsa = NMF(n_components=100)
    mnb = MultinomialNB(alpha=0.01)
    
    train_text = vectorizer.fit_transform(raw_text_train)
    train_text = lsa.fit_transform(train_text)
    train_text = Normalizer(copy=False).fit_transform(train_text)
    
    mnb.fit(train_text, train_labels)
    

    This is the same case but I'm using NMP(non-negative matrix factorization) instead SVD and got 0,04% accuracy.

    Changing the classifier MultinomialNB for RandomForest i got 79% accuracy.

    Therefore change the classifier or don't apply a matrix factorization.

    0 讨论(0)
提交回复
热议问题