Classifying data with naive bayes using LingPipe

℡╲_俬逩灬. 提交于 2019-12-06 04:47:23

问题


I want to classify certain data into different classes based on its content. I did it using naive bayes classifier and I get an output as the best category to which it belongs. But now I want to classify the news other than those in the training set into "others" class. I can't manually add each/every data other than the training data into a certain class since it has vast number of other categories.So is there any way to classify the other data?.

private static File TRAINING_DIR = new File("4news-train");
private static File TESTING_DIR = new File("4news-test");
private static String[] CATEGORIES = { "c1", "c2", "c3", "others" };

private static int NGRAM_SIZE = 6;

public static void main(String[] args) throws ClassNotFoundException, IOException {
    DynamicLMClassifier<NGramProcessLM> classifier = DynamicLMClassifier.createNGramProcess(CATEGORIES, NGRAM_SIZE);
    for (int i = 0; i < CATEGORIES.length; ++i) {
        File classDir = new File(TRAINING_DIR, CATEGORIES[i]);
        if (!classDir.isDirectory()) {
            String msg = "Could not find training directory=" + classDir + "\nTraining directory not found";
            System.out.println(msg); // in case exception gets lost in shell
            throw new IllegalArgumentException(msg);
        }

        String[] trainingFiles = classDir.list();
        for (int j = 0; j < trainingFiles.length; ++j) {
            File file = new File(classDir, trainingFiles[j]);
            String text = Files.readFromFile(file, "ISO-8859-1");
            System.out.println("Training on " + CATEGORIES[i] + "/" + trainingFiles[j]);
            Classification classification = new Classification(CATEGORIES[i]);
            Classified<CharSequence> classified = new Classified<CharSequence>(text, classification);
            classifier.handle(classified);
        }
    }
}

回答1:


Just serialize the object...it means write the intermediate object to a file and that will be your model...

Then for testing you just need to pass the data into the model no need to train each time...It will be quite easier for you




回答2:


Naive Bayes gives you the "confidence" in each classification, as it computes

P(y|x) ~ P(y)P(x|y)

Up to the normalization by P(x) it is a probability of x being a part of class y. You can simply cut-off on this value and say, that

cl(x) = "other" iff max_{over y}(P(y|x)) < T

where T can be for example minimum confidence on the training set

T = min_{over x and y in Training set}( P(y|x) )


来源:https://stackoverflow.com/questions/21849642/classifying-data-with-naive-bayes-using-lingpipe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!