问题
Let's assume we have drawn n=10000
samples of the standard normal distribution.
Now I want to calculate its entropy using histograms to calculate the probabilities.
1) calculate probabilities (for example using matlab)
[p,x] = hist(samples,binnumbers);
area = (x(2)-x(1))*sum(p);
p = p/area;
(binnumbers is determined due to some rule)
2) estimate entropy
H = -sum(p.*log2(p))
which gives 58.6488
Now when i use the direct formula to calculate the entropy of normal data
H = 0.5*log2(2*pi*exp(1)) = 2.0471
What do i do wrong when using the histograms + entropy formula? Thank you very much for any help!!
回答1:
You are missing the dp
term in the sum
dp = (x(2)-x(1));
area = sum(p)*dp;
H = -sum( (p*dp) * log2(p) );
This should bring you close enough...
PS,
be careful when you take log2(p)
for sometimes you might have empty bins. You might find nansum useful.
来源:https://stackoverflow.com/questions/16527672/entropy-estimation-using-histogram-of-normal-data-vs-direct-formula-matlab