发表新帖

发表新帖

KL-Divergence of two GMMs

后端未结

关注

 1  1791

伪装坚强ぢ

I have two GMMs that I used to fit two different sets of data in the same space, and I would like to calculate the KL-divergence between them.

Currently I am using the G

相关标签:

1条回答

小蘑菇

2021-02-04 13:05
There's no closed form for the KL divergence between GMMs. You can easily do Monte Carlo, though. Recall that KL(p||q) = \int p(x) log(p(x) / q(x)) dx = E_p[ log(p(x) / q(x)). So:
```
def gmm_kl(gmm_p, gmm_q, n_samples=10**5):
    X = gmm_p.sample(n_samples)
    log_p_X, _ = gmm_p.score_samples(X)
    log_q_X, _ = gmm_q.score_samples(X)
    return log_p_X.mean() - log_q_X.mean()
```
(mean(log(p(x) / q(x))) = mean(log(p(x)) - log(q(x))) = mean(log(p(x))) - mean(log(q(x))) is somewhat cheaper computationally.)

You don't want to use scipy.stats.entropy; that's for discrete distributions.

If you want the symmetrized and smoothed Jensen-Shannon divergence KL(p||(p+q)/2) + KL(q||(p+q)/2) instead, it's pretty similar:
```
def gmm_js(gmm_p, gmm_q, n_samples=10**5):
    X = gmm_p.sample(n_samples)
    log_p_X, _ = gmm_p.score_samples(X)
    log_q_X, _ = gmm_q.score_samples(X)
    log_mix_X = np.logaddexp(log_p_X, log_q_X)

    Y = gmm_q.sample(n_samples)
    log_p_Y, _ = gmm_p.score_samples(Y)
    log_q_Y, _ = gmm_q.score_samples(Y)
    log_mix_Y = np.logaddexp(log_p_Y, log_q_Y)

    return (log_p_X.mean() - (log_mix_X.mean() - np.log(2))
            + log_q_Y.mean() - (log_mix_Y.mean() - np.log(2))) / 2
```
(log_mix_X/log_mix_Y are actually the log of twice the mixture densities; pulling that out of the mean operation saves some flops.)
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题