问题
Simple question I hope.
If I have a set of data like this:
Classification attribute-1 attribute-2
Correct dog dog
Correct dog dog
Wrong dog cat
Correct cat cat
Wrong cat dog
Wrong cat dog
Then what is the information gain of attribute-2 relative to attribute-1?
I've computed the entropy of the whole data set: -(3/6)log2(3/6)-(3/6)log2(3/6)=1
Then I'm stuck! I think you need to calculate entropies of attribute-1 and attribute-2 too? Then use these three calculations in an information gain calculation?
Any help would be great,
Thank you :).
回答1:
Well first you have to calculate the entropy for each of the attributes. After that you calculate the information gain. Just give me a moment and I'll show how it should be done.
for attribute-1
attr-1=dog:
info([2c,1w])=entropy(2/3,1/3)
attr-1=cat
info([1c,2w])=entropy(1/3,2/3)
Value for attribute-1:
info([2c,1w],[1c,2w])=(3/6)*info([2c,1w])+(3/6)*info([1c,2w])
Gain for attribute-1:
gain("attr-1")=info[3c,3w]-info([2c,1w],[1c,2w])
And you have to do the same for the next attribute.
来源:https://stackoverflow.com/questions/5465447/entropy-and-information-gain