Why WEKA Evaluation class need train instances?

问题

I do not understand why Weka Evaluation class constructor needs the train instances to work.

can anybody explain me?

In theory, the evaluation depends only of the trained model (cls in the next code) and test data (TestingSet).

Thanks!

This is an example:

// TrainingSet is the training Instances

// TestingSet is the testingInstances

// Build de classifier

Classifier cls = (Classifier) new NaiveBayes();

cls.buildClassifier(TrainingSet);

// Test the model

Evaluation eTest = new Evaluation(**TrainingSet**); 

eTest.evaluateModel(cls, TestingSet);

回答1:

From UMass Boston Computer Science Documentation on Weka :

Evaluation public Evaluation(Instances data) throws java.lang.Exception

Initializes all the counters for the evaluation.
Parameters:
data - set of training instances, to get some header information and prior class distribution information
Throws:
java.lang.Exception - if the class is not defined

You can take a look at the constructor source here.

回答2:

I have one posible solution to my own question. I was looking the way to evaluate a test file using a classifier model previously trained and saved in a file. The Evaluation class does not work for me because it needs the train data in the constructor. But it can be used the method classifyInstance of the classifier.

The next code is an example:

public static void LoadAndTest(String filename_test, String filename_model) throws Exception {
   BufferedReader datafile_test = readDataFile(filename_test);
   Instances      data_test     = new Instances(datafile_test);
   data_test.setClassIndex(data_test.numAttributes() - 1);

   Classifier cls = (Classifier) weka.core.SerializationHelper.read(filename_model);
   int act = 0;
   for (int i = 0; i < data_test.numInstances(); i++) {
     double pred = cls.classifyInstance(data_test.instance(i));
     double real = data_test.instance(i).classValue();
     if (pred==real) {
       act = act + 1;
     }
   }  
   double pct = (double) act / (double) data_test.numInstances();
   System.out.println("Accuracy = " + pct);
}

回答3:

For mapping results

Most of the algorithms work on numeric data. So all the non-numeric values of a feature have to converted into a numeric form. This mapping has to be unique. What this means is that all the values which have a specific non-numeric value will be mapped to the same numeric value.
While training the data, the data pre-processor sees the data for the very first time. While converting the non-numeric data the pre-processor uses maps to remember the mapping.

For e.g. If all possible values for a feature are {yes, no, maybe} then these values could be mapped like :
{"yes":1, "no":2, "maybe":3}

So, the input feature/column which looked like [yes,yes,no,yes,maybe,yes] would now be converted into an internal form of [1,1,2,1,3,1]. These numeric values are used by the algorithms.
Now this information is stored in Instances(trained) in Weka. So when the evaluator predicts a numeric value for a feature it needs to convert this numeric value to its actual value.
i.e. If the algo spits out a value of 2 it needs the map to figure out that 2 corresponds to 'no'. To do this the algorithm needs the mapping created before training. Hence it asks for training Instances.

Note : AFAIK same logic applies in all ML frameworks like weka, dl4j, etc.

来源：https://stackoverflow.com/questions/32605249/why-weka-evaluation-class-need-train-instances

标签

java

weka