1 选择属性
2 选择属性算法的介绍
2-1 属性子集评估器
CfsSubsetEval评估器评估每个属性的预测能力以及相互之间的冗余度,倾向于选择与类别属性相关度高,但是相互之间相关度第的属性。选项迭代添加与类别属性相关度最高的属性,只要是子集中不包含与当前属性相关更高的属性。 评估器将缺失值作为单独值,也可以将缺失值计数与其他的值一起按照出现频率分布。
2-2 单个属性评估器
2-3 搜索方法
3 Weka选择属性实例分析
=== Run information === Evaluator: weka.attributeSelection.CfsSubsetEval -P 1 -E 1 Search: weka.attributeSelection.GreedyStepwise -T -1.7976931348623157E308 -N -1 -num-slots 1 Relation: labor-neg-data Instances: 57 Attributes: 17 duration wage-increase-first-year wage-increase-second-year wage-increase-third-year cost-of-living-adjustment working-hours pension standby-pay shift-differential education-allowance statutory-holidays vacation longterm-disability-assistance contribution-to-dental-plan bereavement-assistance contribution-to-health-plan class Evaluation mode: evaluate on all training data === Attribute Selection on all input data === Search Method: Greedy Stepwise (forwards). Start set: no attributes Merit of best subset found: 0.363 Attribute Subset Evaluator (supervised, Class (nominal): 17 class): CFS Subset Evaluator Including locally predictive attributes Selected attributes: 2,3,5,11,12,13,14 : 7 wage-increase-first-year wage-increase-second-year cost-of-living-adjustment statutory-holidays vacation longterm-disability-assistance contribution-to-dental-plan
=== Run information === Evaluator: weka.attributeSelection.WrapperSubsetEval -B weka.classifiers.trees.J48 -F 5 -T 0.01 -R 1 -E DEFAULT -- -C 0.25 -M 2 Search: weka.attributeSelection.BestFirst -D 1 -N 5 Relation: labor-neg-data Instances: 57 Attributes: 17 duration wage-increase-first-year wage-increase-second-year wage-increase-third-year cost-of-living-adjustment working-hours pension standby-pay shift-differential education-allowance statutory-holidays vacation longterm-disability-assistance contribution-to-dental-plan bereavement-assistance contribution-to-health-plan class Evaluation mode: evaluate on all training data === Attribute Selection on all input data === Search Method: Best first. Start set: no attributes Search direction: forward Stale search after 5 node expansions Total number of subsets evaluated: 138 Merit of best subset found: 0.842 Attribute Subset Evaluator (supervised, Class (nominal): 17 class): Wrapper Subset Evaluator Learning scheme: weka.classifiers.trees.J48 Scheme options: -C 0.25 -M 2 Subset evaluation: classification accuracy Number of folds for accuracy estimation: 5 Selected attributes: 1,2,4,6,11,12 : 6 duration wage-increase-first-year wage-increase-third-year working-hours statutory-holidays vacation
=== Run information === Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: labor-neg-data Instances: 57 Attributes: 17 duration wage-increase-first-year wage-increase-second-year wage-increase-third-year cost-of-living-adjustment working-hours pension standby-pay shift-differential education-allowance statutory-holidays vacation longterm-disability-assistance contribution-to-dental-plan bereavement-assistance contribution-to-health-plan class Test mode: 10-fold cross-validation === Classifier model (full training set) === J48 pruned tree ------------------ wage-increase-first-year <= 2.5: bad (15.27/2.27) wage-increase-first-year > 2.5 | statutory-holidays <= 10: bad (10.77/4.77) | statutory-holidays > 10: good (30.96/1.0) Number of Leaves : 3 Size of the tree : 5 Time taken to build model: 0.04 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 42 73.6842 % Incorrectly Classified Instances 15 26.3158 % Kappa statistic 0.4415 Mean absolute error 0.3192 Root mean squared error 0.4669 Relative absolute error 69.7715 % Root relative squared error 97.7888 % Coverage of cases (0.95 level) 91.2281 % Mean rel. region size (0.95 level) 85.9649 % Total Number of Instances 57 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.700 0.243 0.609 0.700 0.651 0.444 0.695 0.559 bad 0.757 0.300 0.824 0.757 0.789 0.444 0.695 0.738 good Weighted Avg. 0.737 0.280 0.748 0.737 0.740 0.444 0.695 0.675 === Confusion Matrix === a b <-- classified as 14 6 | a = bad 9 28 | b = good
=== Run information === Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: labor-neg-data-weka.filters.unsupervised.attribute.Remove-R1,4,6-10,15-16 Instances: 57 Attributes: 8 wage-increase-first-year wage-increase-second-year cost-of-living-adjustment statutory-holidays vacation longterm-disability-assistance contribution-to-dental-plan class Test mode: 10-fold cross-validation === Classifier model (full training set) === J48 pruned tree ------------------ wage-increase-first-year <= 2.5: bad (15.27/2.27) wage-increase-first-year > 2.5 | longterm-disability-assistance = yes | | statutory-holidays <= 10 | | | wage-increase-first-year <= 3: bad (2.0) | | | wage-increase-first-year > 3: good (3.99) | | statutory-holidays > 10: good (25.67) | longterm-disability-assistance = no | | vacation = below_average: bad (5.09/1.09) | | vacation = average: good (2.64/1.0) | | vacation = generous: good (2.34) Number of Leaves : 7 Size of the tree : 12 Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 44 77.193 % Incorrectly Classified Instances 13 22.807 % Kappa statistic 0.4935 Mean absolute error 0.2787 Root mean squared error 0.441 Relative absolute error 60.9191 % Root relative squared error 92.3655 % Coverage of cases (0.95 level) 89.4737 % Mean rel. region size (0.95 level) 78.0702 % Total Number of Instances 57 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.650 0.162 0.684 0.650 0.667 0.494 0.737 0.586 bad 0.838 0.350 0.816 0.838 0.827 0.494 0.733 0.777 good Weighted Avg. 0.772 0.284 0.770 0.772 0.771 0.494 0.735 0.710 === Confusion Matrix === a b <-- classified as 13 7 | a = bad 6 31 | b = good
=== Run information === Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: labor-neg-data-weka.filters.unsupervised.attribute.Remove-R3,5,7-10,13-16 Instances: 57 Attributes: 7 duration wage-increase-first-year wage-increase-third-year working-hours statutory-holidays vacation class Test mode: 10-fold cross-validation === Classifier model (full training set) === J48 pruned tree ------------------ wage-increase-first-year <= 2.5: bad (15.27/2.27) wage-increase-first-year > 2.5 | statutory-holidays <= 10 | | vacation = below_average: bad (7.54/1.54) | | vacation = average: bad (0.0) | | vacation = generous: good (3.23) | statutory-holidays > 10: good (30.96/1.0) Number of Leaves : 5 Size of the tree : 8 Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 46 80.7018 % Incorrectly Classified Instances 11 19.2982 % Kappa statistic 0.5905 Mean absolute error 0.2593 Root mean squared error 0.4162 Relative absolute error 56.6868 % Root relative squared error 87.1592 % Coverage of cases (0.95 level) 92.9825 % Mean rel. region size (0.95 level) 78.9474 % Total Number of Instances 57 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.800 0.189 0.696 0.800 0.744 0.594 0.775 0.608 bad 0.811 0.200 0.882 0.811 0.845 0.594 0.775 0.808 good Weighted Avg. 0.807 0.196 0.817 0.807 0.810 0.594 0.775 0.738 === Confusion Matrix === a b <-- classified as 16 4 | a = bad 7 30 | b = good