Save Naive Bayes Trained Classifier in NLTK

感情迁移 提交于 2019-11-26 17:56:44

问题


I'm slightly confused in regard to how I save a trained classifier. As in, re-training a classifier each time I want to use it is obviously really bad and slow, how do I save it and the load it again when I need it? Code is below, thanks in advance for your help. I'm using Python with NLTK Naive Bayes Classifier.

classifier = nltk.NaiveBayesClassifier.train(training_set)
# look inside the classifier train method in the source code of the NLTK library

def train(labeled_featuresets, estimator=nltk.probability.ELEProbDist):
    # Create the P(label) distribution
    label_probdist = estimator(label_freqdist)
    # Create the P(fval|label, fname) distribution
    feature_probdist = {}
    return NaiveBayesClassifier(label_probdist, feature_probdist)

回答1:


To save:

import pickle
f = open('my_classifier.pickle', 'wb')
pickle.dump(classifier, f)
f.close()

To load later:

import pickle
f = open('my_classifier.pickle', 'rb')
classifier = pickle.load(f)
f.close()



回答2:


I went thru the same problem, and you cannot save the object since is a ELEFreqDistr NLTK class. Anyhow NLTK is hell slow. Training took 45 mins on a decent set and I decided to implement my own version of the algorithm (run it with pypy or rename it .pyx and install cython). It takes about 3 minutes with the same set and it can simply save data as json (I'll implement pickle which is faster/better).

I started a simple github project, check out the code here




回答3:


To Retrain the Pickled Classifer :

f = open('originalnaivebayes5k.pickle','rb')
classifier = pickle.load(f)
classifier.train(training_set)
print('Accuracy:',nltk.classify.accuracy(classifier,testing_set)*100)
f.close()


来源:https://stackoverflow.com/questions/10017086/save-naive-bayes-trained-classifier-in-nltk

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!