问题
I do not understand why Weka Evaluation class constructor needs the train instances to work.
can anybody explain me?
In theory, the evaluation depends only of the trained model (cls in the next code) and test data (TestingSet).
Thanks!
This is an example:
// TrainingSet is the training Instances
// TestingSet is the testingInstances
// Build de classifier
Classifier cls = (Classifier) new NaiveBayes();
cls.buildClassifier(TrainingSet);
// Test the model
Evaluation eTest = new Evaluation(**TrainingSet**);
eTest.evaluateModel(cls, TestingSet);
回答1:
From UMass Boston Computer Science Documentation on Weka :
Evaluation public Evaluation(Instances data) throws java.lang.Exception
Initializes all the counters for the evaluation.
Parameters:
data - set of training instances, to get some header information and prior class distribution information
Throws:
java.lang.Exception - if the class is not defined
You can take a look at the constructor source here.
回答2:
I have one posible solution to my own question. I was looking the way to evaluate a test file using a classifier model previously trained and saved in a file. The Evaluation class does not work for me because it needs the train data in the constructor. But it can be used the method classifyInstance of the classifier.
The next code is an example:
public static void LoadAndTest(String filename_test, String filename_model) throws Exception {
BufferedReader datafile_test = readDataFile(filename_test);
Instances data_test = new Instances(datafile_test);
data_test.setClassIndex(data_test.numAttributes() - 1);
Classifier cls = (Classifier) weka.core.SerializationHelper.read(filename_model);
int act = 0;
for (int i = 0; i < data_test.numInstances(); i++) {
double pred = cls.classifyInstance(data_test.instance(i));
double real = data_test.instance(i).classValue();
if (pred==real) {
act = act + 1;
}
}
double pct = (double) act / (double) data_test.numInstances();
System.out.println("Accuracy = " + pct);
}
回答3:
For mapping results
Most of the algorithms work on numeric data. So all the non-numeric
values of a feature have to converted into a numeric form. This mapping has to be unique. What this means is that all the values which have a specific non-numeric value will be mapped to the same numeric value.
While training the data, the data pre-processor sees the data for the very first time. While converting the non-numeric data the pre-processor uses maps
to remember the mapping.
For e.g. If all possible values for a feature are {yes, no, maybe} then these values could be mapped like :{"yes":1, "no":2, "maybe":3}
So, the input feature/column which looked like [yes,yes,no,yes,maybe,yes]
would now be converted into an internal form of [1,1,2,1,3,1]
. These numeric values are used by the algorithms.
Now this information is stored in Instances(trained) in Weka. So when the evaluator predicts a numeric value for a feature it needs to convert this numeric value to its actual value.
i.e. If the algo spits out a value of 2 it needs the map to figure out that 2 corresponds to 'no'. To do this the algorithm needs the mapping created before training. Hence it asks for training Instances.
Note : AFAIK same logic applies in all ML frameworks like weka, dl4j, etc.
来源:https://stackoverflow.com/questions/32605249/why-weka-evaluation-class-need-train-instances