I'm trying to filter a dataset using weka's java API. I've successfully filtered the attributes I want with a stringToWordVector filter in Weka's GUI but I can't seem to do the same in my java code. I copied and pasted the auto-generated filtering parameters and posted them into my code but am continuing to get errors. Currently, my code looks like this:
Instances newInsts = new Instances(this.instances);
StringToWordVector stringFilter = new StringToWordVector();
stringFilter.setOptions(
weka.core.Utils.splitOptions("-R 1,2,3,4,8 -W 1000
-prune-rate -1.0 -N 0 -stemmer
weka.core.stemmers.NullStemmer -M 1
-tokenizer \"weka.core.tokenizers.WordTokenizer
-delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\""));
stringFilter.setInputFormat(newInsts);
newInsts = Filter.useFilter(newInsts, stringFilter);
But I keep getting this error in my eclipse console: No value given for -delimiters option.
(I added extra spacing for readability in the above code. I suspect this has something to do with escaping characters/quotations marks...)
Thanks!
You can actually omit most of the options, as they are the defaults for StringToWordVector. The delimiters you're trying to pass are the default delimiters in the default tokenizer, WordTokenizer, which are:
' \r\n\t.,;:'"()?!'
来源:https://stackoverflow.com/questions/4963210/weka-stringtowordvector-filter-stringoptions