naivebayes | 易学教程

Ways to improve the accuracy of a Naive Bayes Classifier?

阅读更多关于 Ways to improve the accuracy of a Naive Bayes Classifier?

问题 I am using a Naive Bayes Classifier to categorize several thousand documents into 30 different categories. I have implemented a Naive Bayes Classifier, and with some feature selection (mostly filtering useless words), I've gotten about a 30% test accuracy, with 45% training accuracy. This is significantly better than random, but I want it to be better. I've tried implementing AdaBoost with NB, but it does not appear to give appreciably better results (the literature seems split on this, some

What is the difference between a Bayesian network and a naive Bayes classifier?

阅读更多关于 What is the difference between a Bayesian network and a naive Bayes classifier?

What is the difference between a Bayesian network and a Naive Bayes classifier? I noticed one is just implemented in Matlab as classify the other has an entire net toolbox. If you could explain in your answer which one is more likely to provide a better accuracy as well I would be grateful (not a pre-requisite). Richante Short answer, if you're only interested in solving a prediction task: use Naive Bayes. A Bayesian network (has a good wikipedia page) models relationships between features in a very general way. If you know what these relationships are, or have enough data to derive them, then

Ways to improve the accuracy of a Naive Bayes Classifier?

阅读更多关于 Ways to improve the accuracy of a Naive Bayes Classifier?

I am using a Naive Bayes Classifier to categorize several thousand documents into 30 different categories. I have implemented a Naive Bayes Classifier, and with some feature selection (mostly filtering useless words), I've gotten about a 30% test accuracy, with 45% training accuracy. This is significantly better than random, but I want it to be better. I've tried implementing AdaBoost with NB, but it does not appear to give appreciably better results (the literature seems split on this, some papers say AdaBoost with NB doesn't give better results, others do). Do you know of any other

What is the difference between a Bayesian network and a naive Bayes classifier?

阅读更多关于 What is the difference between a Bayesian network and a naive Bayes classifier?

问题 What is the difference between a Bayesian network and a Naive Bayes classifier? I noticed one is just implemented in Matlab as classify the other has an entire net toolbox. If you could explain in your answer which one is more likely to provide a better accuracy as well I would be grateful (not a pre-requisite). 回答1: Short answer, if you're only interested in solving a prediction task: use Naive Bayes. A Bayesian network (has a good wikipedia page) models relationships between features in a

How to use the a k-fold cross validation in scikit with naive bayes classifier and NLTK

阅读更多关于 How to use the a k-fold cross validation in scikit with naive bayes classifier and NLTK

I have a small corpus and I want to calculate the accuracy of naive Bayes classifier using 10-fold cross validation, how can do it. Your options are to either set this up yourself or use something like NLTK-Trainer since NLTK doesn't directly support cross-validation for machine learning algorithms . I'd recommend probably just using another module to do this for you but if you really want to write your own code you could do something like the following. Supposing you want 10-fold , you would have to partition your training set into 10 subsets, train on 9/10 , test on the remaining 1/10 , and

Classifying Documents into Categories

阅读更多关于 Classifying Documents into Categories

I've got about 300k documents stored in a Postgres database that are tagged with topic categories (there are about 150 categories in total). I have another 150k documents that don't yet have categories. I'm trying to find the best way to programmaticly categorize them. I've been exploring NLTK and its Naive Bayes Classifier. Seems like a good starting point (if you can suggest a better classification algorithm for this task, I'm all ears). My problem is that I don't have enough RAM to train the NaiveBayesClassifier on all 150 categoies/300k documents at once (training on 5 categories used 8GB)

Implementing Bag-of-Words Naive-Bayes classifier in NLTK

阅读更多关于 Implementing Bag-of-Words Naive-Bayes classifier in NLTK

I basically have the same question as this guy .. The example in the NLTK book for the Naive Bayes classifier considers only whether a word occurs in a document as a feature.. it doesn't consider the frequency of the words as the feature to look at ("bag-of-words"). One of the answers seems to suggest this can't be done with the built in NLTK classifiers. Is that the case? How can I do frequency/bag-of-words NB classification with NLTK? scikit-learn has an implementation of multinomial naive Bayes , which is the right variant of naive Bayes in this situation. A support vector machine (SVM)

Save Naive Bayes Trained Classifier in NLTK

阅读更多关于 Save Naive Bayes Trained Classifier in NLTK

I'm slightly confused in regard to how I save a trained classifier. As in, re-training a classifier each time I want to use it is obviously really bad and slow, how do I save it and the load it again when I need it? Code is below, thanks in advance for your help. I'm using Python with NLTK Naive Bayes Classifier. classifier = nltk.NaiveBayesClassifier.train(training_set) # look inside the classifier train method in the source code of the NLTK library def train(labeled_featuresets, estimator=nltk.probability.ELEProbDist): # Create the P(label) distribution label_probdist = estimator(label

Classifying Documents into Categories

阅读更多关于 Classifying Documents into Categories

问题 I've got about 300k documents stored in a Postgres database that are tagged with topic categories (there are about 150 categories in total). I have another 150k documents that don't yet have categories. I'm trying to find the best way to programmaticly categorize them. I've been exploring NLTK and its Naive Bayes Classifier. Seems like a good starting point (if you can suggest a better classification algorithm for this task, I'm all ears). My problem is that I don't have enough RAM to train

Handling continuous data in Spark NaiveBayes

阅读更多关于 Handling continuous data in Spark NaiveBayes

问题 As per official documentation of Spark NaiveBayes: It supports Multinomial NB (see here) which can handle finitely supported discrete data. How can I handle continuous data (for example: percentage of some in some document ) in Spark NaiveBayes? 回答1: The current implementation can process only binary features so for good result you'll have to discretize and encode your data. For discretization you can use either Buketizer or QuantileDiscretizer. The former one is less expensive and might be a