Weka - Classifier returns the same distribution for any input

情到浓时终转凉″ 提交于 2019-12-14 03:30:04

问题


I'm trying to build a naive bayes classifier for classifying text between two classes. Everything works great in the GUI explorer, but when I try to recreate it in code, I get the same output no matter what input I try to classify.

Within the code, I get the same evaluation metrics I get within the GUI (81% accuracy), but whenever I try to create a new instance and classify that, I get the same distributions for both classes no matter what input I use.

Below is my code - its in scala, but is pretty straightforward:

//Building the classifier: 
val instances = new Instances(new DataSource("/my/dataset.arff").getDataSet)
instances.setClassIndex(3)

val filter = new StringToWordVector
filter.setAttributeIndicesArray( (0 to 2).toArray )
val classifier = new FilteredClassifier
classifier.setFilter(new StringToWordVector(1000000))
classifier.setClassifier(new NaiveBayesMultinomial)
classifier.buildClassifier(trainingSet)

//Evaluation (this prints about 80% accuracy)
val eval = new Evaluation(trainingSet)
eval.evaluateModel(classifier, trainingSet)

println(eval.toSummaryString)

//Attempting to use the classifier:

val atts = new util.ArrayList[Attribute]
atts.add(new Attribute("sentence", true))
atts.add(new Attribute("parts_of_speech", true))
atts.add(new Attribute("dependency_graph", true))
atts.add(new Attribute("the_shizzle_clazz", SentenceType.values().map(_.name()).toSeq.asJava ))

val unlabeledInstances = new Instances("unlabeled", atts, 1)
unlabeledInstances.setClassIndex( 3 )

val instance = new DenseInstance(4)

unlabeledInstances.add(instance)
instance.setDataset(unlabeledInstances)

instance.setValue(0, parsed.sentence)
instance.setValue(1, parsed.posTagsStr)
instance.setValue(2, parsed.depsGraphStr)

val distrib = classifier.distributionForInstance(unlabeledInstance.firstInstance())

distrib.foreach(println)

No matter what input I give, the output of distrib is always:

0.44556173367704455
0.5544382663229555

Any ideas what I'm doing wrong? Would greatly appreciate any help.


回答1:


It looks like the magic line was:

instance.setClassMissing()

Adding that made it work. :)



来源:https://stackoverflow.com/questions/47830946/weka-classifier-returns-the-same-distribution-for-any-input

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!