Weka output predictions

落花浮王杯 提交于 2019-12-12 06:36:55

问题


I've used the Weka GUI for training and testing a file (making predictions), but can't do the same with the API. The error I'm getting says there's a different number of attributes in the train and test files. In the GUI, this can be solved by checking "Output predictions".

How to do something similar using the API? do you know of any samples out there?

import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.NominalToBinary;
import weka.filters.unsupervised.attribute.Remove;

public class WekaTutorial
{

  public static void main(String[] args) throws Exception
  {
    DataSource trainSource = new DataSource("/tmp/classes - edited.arff"); // training
    Instances trainData = trainSource.getDataSet();

    DataSource testSource = new DataSource("/tmp/classes_testing.arff");
    Instances testData = testSource.getDataSet();

    if (trainData.classIndex() == -1)
    {
      trainData.setClassIndex(trainData.numAttributes() - 1);
    }

    if (testData.classIndex() == -1)
    {
      testData.setClassIndex(testData.numAttributes() - 1);
    }    

    String[] options = weka.core.Utils.splitOptions("weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer -M 1 "
            + "-tokenizer \"weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");

    Remove remove = new Remove();
    remove.setOptions(options);
    remove.setInputFormat(trainData);

    NominalToBinary filter = new NominalToBinary(); 

    NaiveBayes nb = new NaiveBayes();

    FilteredClassifier fc = new FilteredClassifier();
    fc.setFilter(filter);
    fc.setClassifier(nb);
    // train and make predictions
    fc.buildClassifier(trainData);

    for (int i = 0; i < testData.numInstances(); i++)
    {
      double pred = fc.classifyInstance(testData.instance(i));
      System.out.print("ID: " + testData.instance(i).value(0));
      System.out.print(", actual: " + testData.classAttribute().value((int) testData.instance(i).classValue()));
      System.out.println(", predicted: " + testData.classAttribute().value((int) pred));
    }

  }

}

Error:
Exception in thread "main" java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 2 != 17152

This was not an issue for the GUI.


回答1:


You need to ensure that categories in train and test sets are compatible, try to

  • combine train and test sets
  • List item
  • preprocess them
  • save them as arff
  • open two empty files
  • copy the header from the top to line "@data"
  • copy in training set into first file and test set into second file


来源:https://stackoverflow.com/questions/44451779/weka-output-predictions

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!