问题
I have written a function to calculate entropy of a vector where each element represents number of elements of a class.
function x = Entropy(a)
t = sum(a);
t = repmat(t, [1, size(a, 2)]);
x = sum(-a./t .* log2(a./t));
end
e.g: a = [4 0]
, then entropy = -(0/4)*log2(0/4) - (4/4)*log2(4/4)
But for above function, the entropy is NaN
when the split is pure because of log2(0)
, as in above example. The entropy of pure split should be zero.
How should I solve the problem with least effect on performance as data is very large? Thanks
回答1:
I would suggest you create your own log2
function
function res=mylog2(a)
res=log2(a);
res(isinf(res))=0;
end
This function, while breaking the log2
behaviour, can be used in your specific example because you are multiplying the result with the inside of the log, thus making it zero. It is not "mathematically correct", but I believe that's what you are looking for.
来源:https://stackoverflow.com/questions/30534797/entropy-of-pure-split-caculated-to-nan