问题
I am new to Weka.
I am trying to run WEKA using API's and have found out that the results from the WEKA GUI does not match to the one produced by the Java code.
I am trying to run a RandomForest Algorithm by providing TrainingSet and Test Set.
Here is the code snippet:
DataSource ds = new DataSource(trainingFile);
Instances insts = ds.getDataSet();
insts.setClassIndex(insts.numAttributes() - 1);
Classifier cl = new RandomForest();
RandomForest rf = (RandomForest)cl;
// rf.setOptions(options);
// rf.setNumExecutionSlots(1);
rf.setNumFeatures(5);
rf.setSeed(1);
rf.setNumExecutionSlots(1);
Remove remove = new Remove();
int[] attrs = WekaCustomisation.convertIntegers(attrList);
remove.setAttributeIndicesArray(attrs);
remove.setInvertSelection(true);
remove.setInputFormat(insts);
insts = weka.filters.Filter.useFilter(insts, remove);
insts.setClassIndex(insts.numAttributes() - 1);
weka.core.Instances train = new weka.core.Instances(insts, 0, insts.numInstances());
cl.buildClassifier(train);
weka.core.converters.ConverterUtils.DataSource ds2 = new weka.core.converters.ConverterUtils.DataSource(testFile);
weka.core.Instances instsTest = ds2.getDataSet();
remove.setInputFormat(instsTest);
instsTest = weka.filters.Filter.useFilter(instsTest, remove);
instsTest.setClassIndex(instsTest.numAttributes() - 1);
Instances testInstances = new Instances(instsTest);
int numCorrect = 0;
weka.classifiers.Evaluation eval = new weka.classifiers.Evaluation(train);
eval.evaluateModel(cl, testInstances);
System.out.println(eval.toSummaryString());
out.write(eval.toSummaryString());
double roc = eval.areaUnderROC(0);
The confusion matrix produced by the WEKA GUI and this code differs. What am I missing here.
回答1:
At first check if the parameters and filterings executed in the Weka GUI are the same you are doing in the code. (take a look at the log generated in the GUI)
A second possilibty is the random component that the Random Forest models have in its creation structure (selecting random features in the dataset for each decision tree, see here). So, during the training phase different models are generated to the same train dataset and when you evaluate with the test you get different results.
来源:https://stackoverflow.com/questions/11872974/weka-ui-and-api-code-in-java-gives-different-results