Jensen-Shannon Divergence

前端 未结 5 1463
醉话见心
醉话见心 2021-01-31 05:22

I have another question that I was hoping someone could help me with.

I\'m using the Jensen-Shannon-Divergence to measure the similarity between two probability distribu

相关标签:
5条回答
  • 2021-01-31 06:06

    Note that the scipy entropy call below is the Kullback-Leibler divergence.

    See: http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence

    #!/usr/bin/env python
    from scipy.stats import entropy
    from numpy.linalg import norm
    import numpy as np
    
    def JSD(P, Q):
        _P = P / norm(P, ord=1)
        _Q = Q / norm(Q, ord=1)
        _M = 0.5 * (_P + _Q)
        return 0.5 * (entropy(_P, _M) + entropy(_Q, _M))
    

    Also note that the test case in the Question looks erred?? The sum of the p distribution does not add to 1.0.

    See: http://www.itl.nist.gov/div898/handbook/eda/section3/eda361.htm

    0 讨论(0)
  • 2021-01-31 06:08

    A general version, for n probability distributions, in python

    import numpy as np
    from scipy.stats import entropy as H
    
    
    def JSD(prob_distributions, weights, logbase=2):
        # left term: entropy of misture
        wprobs = weights * prob_distributions
        mixture = wprobs.sum(axis=0)
        entropy_of_mixture = H(mixture, base=logbase)
    
        # right term: sum of entropies
        entropies = np.array([H(P_i, base=logbase) for P_i in prob_distributions])
        wentropies = weights * entropies
        sum_of_entropies = wentropies.sum()
    
        divergence = entropy_of_mixture - sum_of_entropies
        return(divergence)
    
    # From the original example with three distributions:
    P_1 = np.array([1/2, 1/2, 0])
    P_2 = np.array([0, 1/10, 9/10])
    P_3 = np.array([1/3, 1/3, 1/3])
    
    prob_distributions = np.array([P_1, P_2, P_3])
    n = len(prob_distributions)
    weights = np.empty(n)
    weights.fill(1/n)
    
    print(JSD(prob_distributions, weights))
    #0.546621319446
    
    0 讨论(0)
  • 2021-01-31 06:25

    Get some data for distributions with known divergence and compare your results against those known values.

    BTW: the sum in KL_divergence may be rewritten using the zip built-in function like this:

    sum(_p * log(_p / _q) for _p, _q in zip(p, q) if _p != 0)
    

    This does away with lots of "noise" and is also much more "pythonic". The double comparison with 0.0 and 0 is not necessary.

    0 讨论(0)
  • 2021-01-31 06:25

    Explicitly following the math in the Wikipedia article:

    def jsdiv(P, Q):
        """Compute the Jensen-Shannon divergence between two probability distributions.
    
        Input
        -----
        P, Q : array-like
            Probability distributions of equal length that sum to 1
        """
    
        def _kldiv(A, B):
            return np.sum([v for v in A * np.log2(A/B) if not np.isnan(v)])
    
        P = np.array(P)
        Q = np.array(Q)
    
        M = 0.5 * (P + Q)
    
        return 0.5 * (_kldiv(P, M) +_kldiv(Q, M))
    
    0 讨论(0)
  • 2021-01-31 06:28

    Since the Jensen-Shannon distance (distance.jensenshannon) has been included in Scipy 1.2, the Jensen-Shannon divergence can be obtained as the square of the Jensen-Shannon distance:

    from scipy.spatial import distance
    
    distance.jensenshannon([1.0/10, 9.0/10, 0], [0, 1.0/10, 9.0/10]) ** 2
    # 0.5306056938642212
    
    0 讨论(0)
提交回复
热议问题