lingpipe

Using topic modeling Java toolkit

这一生的挚爱 提交于 2020-01-17 09:05:11
问题 I'm working on text classification and I want to use Topic models (LDA). My corpus consists of at least 24, 000 Persian news documents. each doc in the corpus is in format of (keyword, weight) pairs extracted from the news. I saw two Java toolkits: mallet and lingpipe. I've read mallet tutorial on importing the data and it gets data in plain text, not the format that I have. is there any way that I could change it? Also read a little about the lingpipe, the example from tutorial was using

Classifying data with naive bayes using LingPipe

℡╲_俬逩灬. 提交于 2019-12-06 04:47:23
问题 I want to classify certain data into different classes based on its content. I did it using naive bayes classifier and I get an output as the best category to which it belongs. But now I want to classify the news other than those in the training set into "others" class. I can't manually add each/every data other than the training data into a certain class since it has vast number of other categories.So is there any way to classify the other data?. private static File TRAINING_DIR = new File(