问题
I have two tensors, prob_a
and prob_b
with shape [None, 1000]
, and I want to compute the KL divergence from prob_a
to prob_b
. Is there a built-in function for this in TensorFlow? I tried using tf.contrib.distributions.kl(prob_a, prob_b)
, but it gives:
NotImplementedError: No KL(dist_a || dist_b) registered for dist_a type Tensor and dist_b type Tensor
If there is no built-in function, what would be a good workaround?
回答1:
Assuming that your input tensors prob_a
and prob_b
are probability tensors that sum to 1 along the last axis, you could do it like this:
def kl(x, y):
X = tf.distributions.Categorical(probs=x)
Y = tf.distributions.Categorical(probs=y)
return tf.distributions.kl_divergence(X, Y)
result = kl(prob_a, prob_b)
A simple example:
import numpy as np
import tensorflow as tf
a = np.array([[0.25, 0.1, 0.65], [0.8, 0.15, 0.05]])
b = np.array([[0.7, 0.2, 0.1], [0.15, 0.8, 0.05]])
sess = tf.Session()
print(kl(a, b).eval(session=sess)) # [0.88995184 1.08808468]
You would get the same result with
np.sum(a * np.log(a / b), axis=1)
However, this implementation is a bit buggy (checked in Tensorflow 1.8.0).
If you have zero probabilities in a
, e.g. if you try [0.8, 0.2, 0.0]
instead of [0.8, 0.15, 0.05]
, you will get nan
even though by Kullback-Leibler definition 0 * log(0 / b)
should contribute as zero.
To mitigate this, one should add some small numerical constant. It is also prudent to use tf.distributions.kl_divergence(X, Y, allow_nan_stats=False)
to cause a runtime error in such situations.
Also, if there are some zeros in b
, you will get inf
values which won't be caught by the allow_nan_stats=False
option so those have to be handled as well.
回答2:
For there is softmax_cross_entropy_with_logits, there is no need to optimize on KL.
KL(prob_a, prob_b)
= Sum(prob_a * log(prob_a/prob_b))
= Sum(prob_a * log(prob_a) - prob_a * log(prob_b))
= - Sum(prob_a * log(prob_b)) + Sum(prob_a * log(prob_a))
= - Sum(prob_a * log(prob_b)) + const
= H(prob_a, prob_b) + const
回答3:
I'm not sure why it's not implemented, but perhaps there is a workaround. The KL divergence is defined as:
KL(prob_a, prob_b) = Sum(prob_a * log(prob_a/prob_b))
The cross entropy H, on the other hand, is defined as:
H(prob_a, prob_b) = -Sum(prob_a * log(prob_b))
So, if you create a variable y = prob_a/prob_b
, you could obtain the KL divergence by calling negative H(proba_a, y)
. In Tensorflow notation, something like:
KL = tf.reduce_mean(-tf.nn.softmax_cross_entropy_with_logits(prob_a, y))
回答4:
tf.contrib.distributions.kl
takes instances of a tf.distribution
not a Tensor
.
Example:
ds = tf.contrib.distributions
p = ds.Normal(loc=0., scale=1.)
q = ds.Normal(loc=1., scale=2.)
kl = ds.kl_divergence(p, q)
# ==> 0.44314718
回答5:
Assuming that you have access to logits a and b:
prob_a = tf.nn.softmax(a)
cr_aa = tf.nn.softmax_cross_entropy_with_logits(prob_a, a)
cr_ab = tf.nn.softmax_cross_entropy_with_logits(prob_a, b)
kl_ab = tf.reduce_sum(cr_ab - cr_aa)
回答6:
I think this might work:
tf.reduce_sum(p * tf.log(p/q))
where p is my actual probability distribution and q is my approximate probability distribution.
回答7:
I used the function from this code (from this Medium post) to calculate the KL-divergence of any given tensor from a normal Gaussian distribution, where sd
is the standard deviation and mn
is the tensor.
latent_loss = -0.5 * tf.reduce_sum(1.0 + 2.0 * sd - tf.square(mn) - tf.exp(2.0 * sd), 1)
来源:https://stackoverflow.com/questions/41863814/is-there-a-built-in-kl-divergence-loss-function-in-tensorflow