问题
I have an arff table with bool results.
Most of the lines end with "0" (like 95%). But the "0" don't interesting me. i want weka to find lines that end with "1".
But unfortunately, most of the algorithms just select "0" all of the time. That don't help to me at all.
How to make weka reach "1" only? (If it possible)?
回答1:
I think you are describing classical class imbalance problem . That is, almost every machine learning algorithm is designed to look for best accuracy. In your case if it assigns 0 each time it yields 95% accurancy and that is the best what it can do. (for more info google unbalanced classes, or class imbalance). However in cases like this the minority class is of greater interest.
Few quick solutions are: upsample class 1 or downsample class 2, or combine both in order to get balanced dataset for training - you can use WEKA SpreadSubsample for that. You can also have a look at SMOTE filter and MetaCost classifier.
If you are for some reason interested in accuracy you have to test classifier on original distribution so use SpreadSubsample as filtered classifier. However as you may already noticed, if you are interested in minority class, accuracy is not very reliable indicator of model performance. So have a look at class recall, ROC curve and AUC. Great article about ROC is here http://www.hpl.hp.com/techreports/2003/HPL-2003-4.pdf
Good luck
来源:https://stackoverflow.com/questions/22999500/how-to-edit-weka-configurations-to-find-1