问题
I'm using a dataset in Weka for classfication that includes missing values. As far as I understood, Weka replaces them automatically with the Modes or Mean of the training data (using the filter unsupervised/attribute/ReplaceMissingValues
) when using a classifier like NaiveBayes.
I would like to try removing them, to see how this effects the quality of the classifier. Is there a filter to do that?
回答1:
My approach is not the perfect one because IF you have more than 5 or 6 attributes then it becomes quite cumbersome to apply but I can suggest that MultiFilter should be used for this purpose if only a few attributes have missing values.
If you have missing values in 2 attributes then you'll use RemoveWithValues 2 times in a MultiFilter.
- Load your data in Weka Explorer
- Select MultiFilter from the Filter area
- Click on MultiFilter and Add RemoveWithValues
- Then configure each RemoveWithValues filter with the attribute index and select True in matchMissingValues
- Save the filter settings and click Apply in Explorer.
回答2:
Use the removeIf()
method on weka.core.Instances
using the method reference from weka.core.Instance
for the hasMissingValue
method, which returns a boolean if a given Instance has any missing values.
Instances dataset = source.getDataSet() // for some source
dataset.removeIf(Instance::hasMissingValue);
来源:https://stackoverflow.com/questions/18230939/remove-missing-values-in-weka