Implementation of Theil inequality index in python

拈花ヽ惹草 提交于 2019-12-13 13:38:03

问题


I am trying to implement Theil's index (http://en.wikipedia.org/wiki/Theil_index) in Python to measure inequality of revenue in a list.

The formula is basically Shannon's entropy, so it deals with log. My problem is that I have a few revenues at 0 in my list, and log(0) makes my formula unhappy. I believe adding a tiny float to 0 wouldn't work as log(tinyFloat) = -inf, and that would mess my index up.

[EDIT] Here's a snippet (taken from another, much cleaner -and freely available-, implementation)

    def error_if_not_in_range01(value):
        if (value <= 0) or (value > 1):
            raise Exception, \
                str(value) + ' is not in [0,1)!'
    def H(x)
        n = len(x)
        entropy = 0.0
        sum = 0.0
        for x_i in x: # work on all x[i]
            print x_i
            error_if_not_in_range01(x_i)
            sum += x_i
            group_negentropy = x_i*log(x_i)
            entropy += group_negentropy
        error_if_not_1(sum)
        return -entropy
    def T(x):
        print x
        n = len(x)
        maximum_entropy = log(n)
        actual_entropy = H(x)
        redundancy = maximum_entropy - actual_entropy
        inequality = 1 - exp(-redundancy)
        return redundancy,inequality

Is there any way out of this problem?


回答1:


If I understand you correctly, the formula you are trying to implement is the following:

In this case, your problem is calculating the natural logarithm of Xi / mean(X), when Xi = 0.

However, since that has to be multiplied by Xi / mean(X) first, if Xi == 0 the value of ln(Xi / mean(X)) doesn't matter because it will be multiplied by zero. You can treat the value of the formula for that entry as zero, and skip calculating the logarithm entirely.

In the case that you are implementing Shannon's formula directly, the same holds:

In both the first and second form, calculating the log is not necessary if Pi == 0, because whatever value it is, it will have been multiplied by zero.

UPDATE:

Given the code you quoted, you can replace x_i*log(x_i) with a function as follows:

def Group_negentropy(x_i):
    if x_i == 0:
        return 0
    else:
        return x_i*log(x_i)

def H(x)
    n = len(x)
    entropy = 0.0
    sum = 0.0
    for x_i in x: # work on all x[i]
        print x_i
        error_if_not_in_range01(x_i)
        sum += x_i
        group_negentropy = Group_negentropy(x_i)
        entropy += group_negentropy
    error_if_not_1(sum)
    return -entropy


来源:https://stackoverflow.com/questions/20279458/implementation-of-theil-inequality-index-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!