Entropy of pure split caculated to NaN

拥有回忆 提交于 2019-12-11 03:27:44

问题


I have written a function to calculate entropy of a vector where each element represents number of elements of a class.

function x = Entropy(a)
    t = sum(a);
    t = repmat(t, [1, size(a, 2)]);
    x = sum(-a./t .* log2(a./t));
end

e.g: a = [4 0], then entropy = -(0/4)*log2(0/4) - (4/4)*log2(4/4)

But for above function, the entropy is NaN when the split is pure because of log2(0), as in above example. The entropy of pure split should be zero.

How should I solve the problem with least effect on performance as data is very large? Thanks


回答1:


I would suggest you create your own log2 function

function res=mylog2(a)
   res=log2(a);
   res(isinf(res))=0;
end

This function, while breaking the log2 behaviour, can be used in your specific example because you are multiplying the result with the inside of the log, thus making it zero. It is not "mathematically correct", but I believe that's what you are looking for.



来源:https://stackoverflow.com/questions/30534797/entropy-of-pure-split-caculated-to-nan

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!