naiveBayes and predict function not working in R

廉价感情. 提交于 2019-12-12 03:27:57

问题


I am doing a sentiment analysis on twitter comments (in Kazakh language) using below R script. 3000 (1500sad, 1500happy) comments for the training set and 1000 (happy sad mixed) comments for the test set. Everything works great but at the end, the predicted values are showing all happy, which is not right.

I have checked every function and all are working up until the naiveBayes function. I checked classifier values and they are correct. I think either naiveBayes or predict is messing things up.

When I used only one happy comment (first on the list) and 1500 sad(negative) comments as training set with this code, predicted results are all happy, which I think should have been sad mostly.

classifier = naiveBayes(mat[1500:3000,], as.factor(sentiment_all[1500:3000]))

However, when I used all sad or negative comments for the training set, the predicted results are all sad.

classifier = naiveBayes(mat[1501:3000,], as.factor(sentiment_all[1501:3000]))

I spent hours and I am completely lost where the problem is. Please help me to solve this issue.

Here is the script:

setwd("Path")
happy = readLines("Path")
sad = readLines("Path")
happy_test = readLines("Path")
sad_test = readLines("Path")

tweet = c(happy, sad)
tweet_test= c(happy_test, sad_test)
tweet_all = c(tweet, tweet_test)
sentiment = c(rep("happy", length(happy) ), 
              rep("sad", length(sad)))
sentiment_test = c(rep("happy", length(happy_test) ), 
                   rep("sad", length(sad_test)))
sentiment_all = as.factor(c(sentiment, sentiment_test))

library(RTextTools)
library(e1071)

# naive bayes
mat= create_matrix(tweet_all, language="kazakh", 
                   removeStopwords=FALSE, removeNumbers=TRUE, 
                   stemWords=FALSE, tm::weightTfIdf)

mat = as.matrix(mat)

classifier = naiveBayes(mat[1:3000,], as.factor(sentiment_all[1:3000]))
predicted = predict(classifier, mat[3001:4000,]); predicted

回答1:


Your issue is very basic, you are setting up your problem wrong. Ideally you want a 50-50 split of positives and negatives for your training data. Because of how the Naive Bayes classifier works, it is trying to minimize entropy.

I am guessing that in your case where you have only 1 positive comment, the classifier was able to minimize entropy very easily based on multiple predictors.

Where you use absolutely no positive comments, you are basically saying that the only predicted value/ the only possible outcome is "sad" and that is exactly what your model is doing.

As for your main issue, try a different using a different data set. Where are you getting your tweets from, are they sufficiently diverse?



来源:https://stackoverflow.com/questions/36643592/naivebayes-and-predict-function-not-working-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!