Weka prediction (percentage confidence) - what does it mean?

与世无争的帅哥 提交于 2020-01-06 03:00:10

问题


I've been teaching myself Weka and have learned how to build models and get predictions out of them (predictions using the CLI).

When I run prediction on a data set from a previously built model I get a column that is the "prediction" also known as prediction confidence for each instance predicted.

I know what percent confidence means but shouldn't all my predictions be the accuracy of my Weka Model?

aka if I have a J48 Decision tree classifier with accuracy of 90%, shouldn't every classified instance using this model be 90% prediction confidence?

Any one know how this percentage confidence is calculated or how I should read the error prediction and model accuracy when telling others about my model? Thanks


回答1:


Basically, when a decision tree is training on a dataset, you often want to (or because of missing features have to) stop it before it overfits on every single training instance. When this happens, you will have several training samples at the leaf nodes in the tree. Very often the training labels will still be mixed at that point (not all positive class and not all negative class.)

The confidence is some measure of how consistent the training labels were by the time the tree got down to a leaf for that training instance.

Edit: note this is also used to handle missing features (attributes) in a clean and unbiased way.

See here for a brief definition of this.

Also look at some of Quinlan's work on decision trees for this. Particularly his work on C4.5

Also: "I know what percent confidence means but shouldn't all my predictions be the accuracy of my Weka Model?"

No, this isn't true, some training samples will be more easy to classify than others and these scores reflect this.



来源:https://stackoverflow.com/questions/11084248/weka-prediction-percentage-confidence-what-does-it-mean

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!