naivebayes

How to use PoS tag as a feature for training data by Naive Bayes classifier?

我是研究僧i 提交于 2019-12-11 03:05:33
问题 I'm researching how to extract keyphrases from document for my thesis. In my research, I used Naive Bayes classifier machine learning for creating a training model of the candidate term features. One of features is PoS tag , I think this feature is important for specifying a term is keyphrase or not. But the input of Naive Bayes (NB) classifier is numbers and the PoS tag is a string. So I don't know the way to represent PoS tag feature as a number in order to become a input feature for NB

How to get the probabilities of classes in Spark Naive Bayes classifier?

好久不见. 提交于 2019-12-11 02:07:14
问题 I'm training a NaiveBayesModel in Spark, however when I'm using it to predict a new instance I need to get the probabilities for each class. I looked at the code of predict function in NaiveBayesModel and come up with the following code: val thetaMatrix = new DenseMatrix (model.labels.length,model.theta(0).length,model.theta.flatten,true) val piVector = new DenseVector(model.pi) //val prob = thetaMatrix.multiply(test.features) val x = test.map {p => val prob = thetaMatrix.multiply(p.features)

naive classifier matlab

余生长醉 提交于 2019-12-10 10:11:32
问题 When testing the naive classifier in matlab I get different results even though I trained and tested on the same sample data, I was wondering if my code is correct and if someone could help explain why this is? %% dimensionality reduction columns = 6 [U,S,V]=svds(fulldata,columns); %% randomly select dataset rows = 1000; columns = 6; %# pick random rows indX = randperm( size(fulldata,1) ); indX = indX(1:rows)'; %# pick random columns %indY = randperm( size(fulldata,2) ); indY = indY(1:columns

Python: Loaded NLTK Classifier not working

我的未来我决定 提交于 2019-12-10 07:36:02
问题 I'm trying to train a NLTK classifier for sentiment analysis and then save the classifier using pickle. The freshly trained classifier works fine. However, if I load a saved classifier the classifier will either output 'positive', or 'negative' for ALL examples. I'm saving the classifier using classifier = nltk.NaiveBayesClassifier.train(training_set) classifier.classify(words_in_tweet) f = open('classifier.pickle', 'wb') pickle.dump(classifier, f) f.close() and loading the classifier using f

Simple text classification using naive bayes (weka) in java

馋奶兔 提交于 2019-12-10 04:35:09
问题 I try to do text classification naive bayes weka libarary in my java code, but i think the result of the classification is not correct, i don't know what's the problem. I use arff file for the input. this is my training data: @relation hamspam @attribute text string @attribute class {spam,ham} @data 'good',ham 'good',ham 'very good',ham 'bad',spam 'very bad',spam 'very bad, very bad',spam 'good good bad',ham this is my testing_data: @relation test @attribute text string @attribute class {spam

Warnings while using the Naive Bayes Classifier in the Caret Package

穿精又带淫゛_ 提交于 2019-12-09 23:41:34
问题 I am attempting to run a supervised machine learning classifier known as Naive Bayes in the caret Package. My data is called LDA.scores, and has two categorical factors called "V4" and "G8", and 12 predictor variables. The code that I am using was adapted by a kind person on stack overflow from code supplied by myself (see link below).The code does work, however, only 9 predictors were used instead of the 12 predictors in the data-set. When I tried to train the Naive Bayes model with the

Naive Bayes: the within-class variance in each feature of TRAINING must be positive

妖精的绣舞 提交于 2019-12-09 17:44:32
问题 When trying to fit Naive Bayes: training_data = sample; % target_class = K8; # train model nb = NaiveBayes.fit(training_data, target_class); # prediction y = nb.predict(cluster3); I get an error: ??? Error using ==> NaiveBayes.fit>gaussianFit at 535 The within-class variance in each feature of TRAINING must be positive. The within-class variance in feature 2 5 6 in class normal. are not positive. Error in ==> NaiveBayes.fit at 498 obj = gaussianFit(obj, training, gindex); Can anyone shed

Plotting a linear discriminant analysis, classification tree and Naive Bayes Curve on a single ROC plot

女生的网名这么多〃 提交于 2019-12-08 04:55:09
问题 The data is present at the very bottom of the page and is called LDA.scores'. This is a classification task where I performed three supervised machine learning classification techniques on the data-set. All coding is supplied to show how these ROC curves were produced. I apologise for asking a loaded question but I have been trying to solve these issues using different combinations of code for almost two weeks, so if anyone can help me, then thank you. The main issue is the Naive Bayes curve

Plotting a linear discriminant analysis, classification tree and Naive Bayes Curve on a single ROC plot

为君一笑 提交于 2019-12-06 22:17:51
The data is present at the very bottom of the page and is called LDA.scores'. This is a classification task where I performed three supervised machine learning classification techniques on the data-set. All coding is supplied to show how these ROC curves were produced. I apologise for asking a loaded question but I have been trying to solve these issues using different combinations of code for almost two weeks, so if anyone can help me, then thank you. The main issue is the Naive Bayes curve shows a perfect score of 1, which is obviously wrong, and I cannot solve how to incorporate the linear

Quanteda package, Naive Bayes: How can I predict on different-featured test data?

点点圈 提交于 2019-12-06 09:21:06
问题 I used quanteda::textmodel_NB to create a model that categorizes text into one of two categories. I fit the model on a training data set of data from last summer. Now, I am trying to use it this summer to categorize new text we get here at work. I tried doing this and got the following error: Error in predict.textmodel_NB_fitted(model, test_dfm) : feature set in newdata different from that in training set The code in the function that generates the error can be found here at lines 157 to 165.