问题
I'm trying to use Weka for text classification. I have two ARFF files:
One for the training set (example of row in data):
"mouse",no,no,no,no,no,yes,no
and another one for test set (example of row in data:)
"cat",?,?,?,?,?,?,?
They have the same attribute declaration. But if I use batch filtering it tells me "Input file formats differ". Why?
Here is the command that I use:
C:\Programmi\Weka-3-6>java -cp C:\Programmi\Weka-3-6\weka.jar
weka.filters.unsupervised.attribute.StringToWordVector -b -i test1.arff
-o output_training.arff -c last -r tent.arff -s output_tent.arff
-R -O -C -T -I -N 0 -M 1
Here you are the headers: 1) training
@RELATION tent
@Attribute text string
@Attribute politica {yes,no}
@Attribute sports {yes,no}
@Attribute cinema/tv/musica {yes,no}
@Attribute stato_personale {yes,no}
@Attribute moda/stile {yes,no}
@Attribute conversazione {yes,no}
@Attribute attualità {yes,no}
2)test
@RELATION test
@Attribute text string
@Attribute politica {yes,no}
@Attribute sports {yes,no}
@Attribute cinema/tv/musica {yes,no}
@Attribute stato_personale {yes,no}
@Attribute moda/stile {yes,no}
@Attribute conversazione {yes,no}
@Attribute attualità {yes,no}
I also tried to set the same @RELATION name in both but it does the same error. Separately the two files work ok and I can perform the StringToWordVector correctly. Thanks again
来源:https://stackoverflow.com/questions/27425952/weka-batch-filtering-stringtowordvector