entropy estimation using histogram of normal data vs direct formula (matlab)

こ雲淡風輕ζ 提交于 2019-12-22 09:09:53

问题


Let's assume we have drawn n=10000 samples of the standard normal distribution.

Now I want to calculate its entropy using histograms to calculate the probabilities.

1) calculate probabilities (for example using matlab)

[p,x] = hist(samples,binnumbers);
area = (x(2)-x(1))*sum(p);
p = p/area;

(binnumbers is determined due to some rule)

2) estimate entropy

H = -sum(p.*log2(p))

which gives 58.6488

Now when i use the direct formula to calculate the entropy of normal data

H = 0.5*log2(2*pi*exp(1)) = 2.0471

What do i do wrong when using the histograms + entropy formula? Thank you very much for any help!!


回答1:


You are missing the dp term in the sum

dp = (x(2)-x(1));
area = sum(p)*dp;
H = -sum( (p*dp) * log2(p) );

This should bring you close enough...

PS,
be careful when you take log2(p) for sometimes you might have empty bins. You might find nansum useful.



来源:https://stackoverflow.com/questions/16527672/entropy-estimation-using-histogram-of-normal-data-vs-direct-formula-matlab

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!