Weka output predictions

问题

I've used the Weka GUI for training and testing a file (making predictions), but can't do the same with the API. The error I'm getting says there's a different number of attributes in the train and test files. In the GUI, this can be solved by checking "Output predictions".

How to do something similar using the API? do you know of any samples out there?

import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.NominalToBinary;
import weka.filters.unsupervised.attribute.Remove;

public class WekaTutorial
{

  public static void main(String[] args) throws Exception
  {
    DataSource trainSource = new DataSource("/tmp/classes - edited.arff"); // training
    Instances trainData = trainSource.getDataSet();

    DataSource testSource = new DataSource("/tmp/classes_testing.arff");
    Instances testData = testSource.getDataSet();

    if (trainData.classIndex() == -1)
    {
      trainData.setClassIndex(trainData.numAttributes() - 1);
    }

    if (testData.classIndex() == -1)
    {
      testData.setClassIndex(testData.numAttributes() - 1);
    }    

    String[] options = weka.core.Utils.splitOptions("weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer -M 1 "
            + "-tokenizer \"weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");

    Remove remove = new Remove();
    remove.setOptions(options);
    remove.setInputFormat(trainData);

    NominalToBinary filter = new NominalToBinary(); 

    NaiveBayes nb = new NaiveBayes();

    FilteredClassifier fc = new FilteredClassifier();
    fc.setFilter(filter);
    fc.setClassifier(nb);
    // train and make predictions
    fc.buildClassifier(trainData);

    for (int i = 0; i < testData.numInstances(); i++)
    {
      double pred = fc.classifyInstance(testData.instance(i));
      System.out.print("ID: " + testData.instance(i).value(0));
      System.out.print(", actual: " + testData.classAttribute().value((int) testData.instance(i).classValue()));
      System.out.println(", predicted: " + testData.classAttribute().value((int) pred));
    }

  }

}

Error:
Exception in thread "main" java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 2 != 17152

This was not an issue for the GUI.

回答1:

You need to ensure that categories in train and test sets are compatible, try to

combine train and test sets
List item
preprocess them
save them as arff
open two empty files
copy the header from the top to line "@data"
copy in training set into first file and test set into second file

来源：https://stackoverflow.com/questions/44451779/weka-output-predictions

标签

java

weka