问题
I'm trying to build a naive bayes classifier for classifying text between two classes. Everything works great in the GUI explorer, but when I try to recreate it in code, I get the same output no matter what input I try to classify.
Within the code, I get the same evaluation metrics I get within the GUI (81% accuracy), but whenever I try to create a new instance and classify that, I get the same distributions for both classes no matter what input I use.
Below is my code - its in scala, but is pretty straightforward:
//Building the classifier:
val instances = new Instances(new DataSource("/my/dataset.arff").getDataSet)
instances.setClassIndex(3)
val filter = new StringToWordVector
filter.setAttributeIndicesArray( (0 to 2).toArray )
val classifier = new FilteredClassifier
classifier.setFilter(new StringToWordVector(1000000))
classifier.setClassifier(new NaiveBayesMultinomial)
classifier.buildClassifier(trainingSet)
//Evaluation (this prints about 80% accuracy)
val eval = new Evaluation(trainingSet)
eval.evaluateModel(classifier, trainingSet)
println(eval.toSummaryString)
//Attempting to use the classifier:
val atts = new util.ArrayList[Attribute]
atts.add(new Attribute("sentence", true))
atts.add(new Attribute("parts_of_speech", true))
atts.add(new Attribute("dependency_graph", true))
atts.add(new Attribute("the_shizzle_clazz", SentenceType.values().map(_.name()).toSeq.asJava ))
val unlabeledInstances = new Instances("unlabeled", atts, 1)
unlabeledInstances.setClassIndex( 3 )
val instance = new DenseInstance(4)
unlabeledInstances.add(instance)
instance.setDataset(unlabeledInstances)
instance.setValue(0, parsed.sentence)
instance.setValue(1, parsed.posTagsStr)
instance.setValue(2, parsed.depsGraphStr)
val distrib = classifier.distributionForInstance(unlabeledInstance.firstInstance())
distrib.foreach(println)
No matter what input I give, the output of distrib is always:
0.44556173367704455
0.5544382663229555
Any ideas what I'm doing wrong? Would greatly appreciate any help.
回答1:
It looks like the magic line was:
instance.setClassMissing()
Adding that made it work. :)
来源:https://stackoverflow.com/questions/47830946/weka-classifier-returns-the-same-distribution-for-any-input