Is there a built-in KL divergence loss function in TensorFlow?

问题

I have two tensors, prob_a and prob_b with shape [None, 1000], and I want to compute the KL divergence from prob_a to prob_b. Is there a built-in function for this in TensorFlow? I tried using tf.contrib.distributions.kl(prob_a, prob_b), but it gives:

NotImplementedError: No KL(dist_a || dist_b) registered for dist_a type Tensor and dist_b type Tensor

If there is no built-in function, what would be a good workaround?

回答1:

Assuming that your input tensors prob_a and prob_b are probability tensors that sum to 1 along the last axis, you could do it like this:

def kl(x, y):
    X = tf.distributions.Categorical(probs=x)
    Y = tf.distributions.Categorical(probs=y)
    return tf.distributions.kl_divergence(X, Y)

result = kl(prob_a, prob_b)

A simple example:

import numpy as np
import tensorflow as tf
a = np.array([[0.25, 0.1, 0.65], [0.8, 0.15, 0.05]])
b = np.array([[0.7, 0.2, 0.1], [0.15, 0.8, 0.05]])
sess = tf.Session()
print(kl(a, b).eval(session=sess))  # [0.88995184 1.08808468]

You would get the same result with

np.sum(a * np.log(a / b), axis=1)

However, this implementation is a bit buggy (checked in Tensorflow 1.8.0).

If you have zero probabilities in a, e.g. if you try [0.8, 0.2, 0.0] instead of [0.8, 0.15, 0.05], you will get nan even though by Kullback-Leibler definition 0 * log(0 / b) should contribute as zero.

To mitigate this, one should add some small numerical constant. It is also prudent to use tf.distributions.kl_divergence(X, Y, allow_nan_stats=False) to cause a runtime error in such situations.

Also, if there are some zeros in b, you will get inf values which won't be caught by the allow_nan_stats=False option so those have to be handled as well.

回答2:

For there is softmax_cross_entropy_with_logits, there is no need to optimize on KL.

KL(prob_a, prob_b)  
  = Sum(prob_a * log(prob_a/prob_b))  
  = Sum(prob_a * log(prob_a) - prob_a * log(prob_b))  
  = - Sum(prob_a * log(prob_b)) + Sum(prob_a * log(prob_a)) 
  = - Sum(prob_a * log(prob_b)) + const 
  = H(prob_a, prob_b) + const

回答3:

I'm not sure why it's not implemented, but perhaps there is a workaround. The KL divergence is defined as:

KL(prob_a, prob_b) = Sum(prob_a * log(prob_a/prob_b))

The cross entropy H, on the other hand, is defined as:

H(prob_a, prob_b) = -Sum(prob_a * log(prob_b))

So, if you create a variable y = prob_a/prob_b, you could obtain the KL divergence by calling negative H(proba_a, y). In Tensorflow notation, something like:

KL = tf.reduce_mean(-tf.nn.softmax_cross_entropy_with_logits(prob_a, y))

回答4:

tf.contrib.distributions.kl takes instances of a tf.distribution not a Tensor.

Example:

  ds = tf.contrib.distributions
  p = ds.Normal(loc=0., scale=1.)
  q = ds.Normal(loc=1., scale=2.)
  kl = ds.kl_divergence(p, q)
  # ==> 0.44314718

回答5:

Assuming that you have access to logits a and b:

prob_a = tf.nn.softmax(a)
cr_aa = tf.nn.softmax_cross_entropy_with_logits(prob_a, a)
cr_ab = tf.nn.softmax_cross_entropy_with_logits(prob_a, b)
kl_ab = tf.reduce_sum(cr_ab - cr_aa)

回答6:

I think this might work:

tf.reduce_sum(p * tf.log(p/q))

where p is my actual probability distribution and q is my approximate probability distribution.

回答7:

I used the function from this code (from this Medium post) to calculate the KL-divergence of any given tensor from a normal Gaussian distribution, where sd is the standard deviation and mn is the tensor.

latent_loss = -0.5 * tf.reduce_sum(1.0 + 2.0 * sd - tf.square(mn) - tf.exp(2.0 * sd), 1)

来源：https://stackoverflow.com/questions/41863814/is-there-a-built-in-kl-divergence-loss-function-in-tensorflow

标签

python

statistics

tensorflow

entropy