问题
I am trying to implement Theil's index (http://en.wikipedia.org/wiki/Theil_index) in Python to measure inequality of revenue in a list.
The formula is basically Shannon's entropy, so it deals with log. My problem is that I have a few revenues at 0 in my list, and log(0) makes my formula unhappy. I believe adding a tiny float to 0 wouldn't work as log(tinyFloat) = -inf, and that would mess my index up.
[EDIT] Here's a snippet (taken from another, much cleaner -and freely available-, implementation)
def error_if_not_in_range01(value):
if (value <= 0) or (value > 1):
raise Exception, \
str(value) + ' is not in [0,1)!'
def H(x)
n = len(x)
entropy = 0.0
sum = 0.0
for x_i in x: # work on all x[i]
print x_i
error_if_not_in_range01(x_i)
sum += x_i
group_negentropy = x_i*log(x_i)
entropy += group_negentropy
error_if_not_1(sum)
return -entropy
def T(x):
print x
n = len(x)
maximum_entropy = log(n)
actual_entropy = H(x)
redundancy = maximum_entropy - actual_entropy
inequality = 1 - exp(-redundancy)
return redundancy,inequality
Is there any way out of this problem?
回答1:
If I understand you correctly, the formula you are trying to implement is the following:
In this case, your problem is calculating the natural logarithm of Xi / mean(X)
, when Xi = 0
.
However, since that has to be multiplied by Xi / mean(X)
first, if Xi == 0
the value of ln(Xi / mean(X))
doesn't matter because it will be multiplied by zero. You can treat the value of the formula for that entry as zero, and skip calculating the logarithm entirely.
In the case that you are implementing Shannon's formula directly, the same holds:
In both the first and second form, calculating the log is not necessary if Pi == 0
, because whatever value it is, it will have been multiplied by zero.
UPDATE:
Given the code you quoted, you can replace x_i*log(x_i)
with a function as follows:
def Group_negentropy(x_i):
if x_i == 0:
return 0
else:
return x_i*log(x_i)
def H(x)
n = len(x)
entropy = 0.0
sum = 0.0
for x_i in x: # work on all x[i]
print x_i
error_if_not_in_range01(x_i)
sum += x_i
group_negentropy = Group_negentropy(x_i)
entropy += group_negentropy
error_if_not_1(sum)
return -entropy
来源:https://stackoverflow.com/questions/20279458/implementation-of-theil-inequality-index-in-python