weka batch filtering StringToWordVector

问题

I'm trying to use Weka for text classification. I have two ARFF files:

One for the training set (example of row in data):

"mouse",no,no,no,no,no,yes,no

and another one for test set (example of row in data:)

"cat",?,?,?,?,?,?,?

They have the same attribute declaration. But if I use batch filtering it tells me "Input file formats differ". Why?

Here is the command that I use:

C:\Programmi\Weka-3-6>java -cp C:\Programmi\Weka-3-6\weka.jar 
  weka.filters.unsupervised.attribute.StringToWordVector -b -i test1.arff
  -o output_training.arff -c last -r tent.arff -s output_tent.arff
  -R -O -C -T -I -N 0 -M 1

Here you are the headers: 1) training

@RELATION tent

@Attribute text                 string
@Attribute politica             {yes,no}
@Attribute sports               {yes,no}
@Attribute cinema/tv/musica     {yes,no}
@Attribute stato_personale      {yes,no}
@Attribute moda/stile           {yes,no}
@Attribute conversazione        {yes,no}
@Attribute attualità            {yes,no}

2)test

@RELATION test

@Attribute text                 string
@Attribute politica             {yes,no}
@Attribute sports               {yes,no}
@Attribute cinema/tv/musica     {yes,no}
@Attribute stato_personale      {yes,no}
@Attribute moda/stile           {yes,no}
@Attribute conversazione        {yes,no}
@Attribute attualità            {yes,no}

I also tried to set the same @RELATION name in both but it does the same error. Separately the two files work ok and I can perform the StringToWordVector correctly. Thanks again

来源：https://stackoverflow.com/questions/27425952/weka-batch-filtering-stringtowordvector

标签

weka

text-classification

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!