问题
Overview
I know that one can get the percentages of each prediction in a trained WEKA model through the GUI and command line options as conveniently explained and demonstrated in the documentation article "Making predictions".
Predictions
I know that there are three ways documented to get these predictions:
- command line
- GUI
- Java code/using the WEKA API, which I was able to do in the answer to "Get risk predictions in WEKA using own Java code"
- this fourth one requires a generated WEKA
.MODEL
file
I have a trained .MODEL
file and now I want to classify new instances using this together with the prediction percentages similar to the one below (an output of the GUI's Explorer, in CSV
format):
inst#,actual,predicted,error,distribution,
1,1:0,2:1,+,0.399409,*0.7811
2,1:0,2:1,+,0.3932409,*0.8191
3,1:0,2:1,+,0.399409,*0.600591
4,1:0,2:1,+,0.139409,*0.64
5,1:0,2:1,+,0.399409,*0.600593
6,1:0,2:1,+,0.3993209,*0.600594
7,1:0,2:1,+,0.500129,*0.600594
8,1:0,2:1,+,0.399409,*0.90011
9,1:0,2:1,+,0.211409,*0.60182
10,1:0,2:1,+,0.21909,*0.11101
The predicted
column is what I want to get from a .MODEL
file.
What I know
Based from my experience with the WEKA API approach, one can get these predictions using the following code (the PlainText
inserted into an Evaluation
object) BUT I do not want to do k-fold cross-validation that is provided by the Evaluation
object.
StringBuffer predictionSB = new StringBuffer();
Range attributesToShow = null;
Boolean outputDistributions = new Boolean(true);
PlainText predictionOutput = new PlainText();
predictionOutput.setBuffer(predictionSB);
predictionOutput.setOutputDistribution(true);
Evaluation evaluation = new Evaluation(data);
evaluation.crossValidateModel(j48Model, data, numberOfFolds,
randomNumber, predictionOutput, attributesToShow,
outputDistributions);
System.out.println(predictionOutput.getBuffer());
From the WEKA documentation
Note that a .MODEL
file classifies data from an .ARFF
or related input is discussed in "Use Weka in your Java code" and "Serialization" a.k.a. "How to use a .MODEL
file in your own Java code to classify new instances" (why the vague title smfh).
Using own Java code to classify
Loading a .MODEL
file is through "Deserialization" and the following is for versions > 3.5.5:
// deserialize model
Classifier cls = (Classifier) weka.core.SerializationHelper.read("/some/where/j48.model");
An Instance
object is the data and it is fed to the classifyInstance
. An output is provided here (depending on the data type of the outcome attribute):
// classify an Instance object (testData)
cls.classifyInstance(testData.instance(0));
The question "How to reuse saved classifier created from explorer(in weka) in eclipse java" has a great answer too!
Javadocs
I have already checked the Javadocs for Classifier (the trained model) and Evaluation (just in case) but none directly and explicitly addresses this issue.
The only thing closest to what I want is the classifyInstances
method of the Classifier
:
Classifies the given test instance. The instance has to belong to a dataset when it's being classified. Note that a classifier MUST implement either this or distributionForInstance().
How can I simultaneously use a WEKA .MODEL
file to classify and get predictions of a new instance using my own Java code (aka using the WEKA API)?
回答1:
This answer simply updates my answer from How to reuse saved classifier created from explorer(in weka) in eclipse java.
I will show how to obtain the predicted instance value and the prediction percentage (or distribution). The example model is a J48 decision tree created and saved in the Weka Explorer. It was built from the nominal weather data provided with Weka. It is called "tree.model".
import weka.classifiers.Classifier;
import weka.core.Instances;
public class Main {
public static void main(String[] args) throws Exception
{
String rootPath="/some/where/";
Instances originalTrain= //instances here
//load model
Classifier cls = (Classifier) weka.core.SerializationHelper.read(rootPath+"tree.model");
//predict instance class values
Instances originalTrain= //load or create Instances to predict
//which instance to predict class value
int s1=0;
//perform your prediction
double value=cls.classifyInstance(originalTrain.instance(s1));
//get the prediction percentage or distribution
double[] percentage=cls.distributionForInstance(originalTrain.instance(s1));
//get the name of the class value
String prediction=originalTrain.classAttribute().value((int)value);
System.out.println("The predicted value of instance "+
Integer.toString(s1)+
": "+prediction);
//Format the distribution
String distribution="";
for(int i=0; i <percentage.length; i=i+1)
{
if(i==value)
{
distribution=distribution+"*"+Double.toString(percentage[i])+",";
}
else
{
distribution=distribution+Double.toString(percentage[i])+",";
}
}
distribution=distribution.substring(0, distribution.length()-1);
System.out.println("Distribution:"+ distribution);
}
}
The output from this is:
The predicted value of instance 0: no
Distribution: *1, 0
来源:https://stackoverflow.com/questions/21674522/get-prediction-percentage-in-weka-using-own-java-code-and-a-model