问题
In my code I'm using theano to calculate an euclidean distance matrix (code from here):
import theano
import theano.tensor as T
MAT = T.fmatrix('MAT')
squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)
f_euclidean = theano.function([MAT], T.sqrt(squared_euclidean_distances))
def pdist_euclidean(mat):
return f_euclidean(mat)
But the following code causes some values of the matrix to be NaN
. I've read that this happens when calculating theano.tensor.sqrt()
and here it's suggested to
Add an eps inside the sqrt (or max(x,EPs))
So I've added an eps to my code:
import theano
import theano.tensor as T
eps = 1e-9
MAT = T.fmatrix('MAT')
squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)
f_euclidean = theano.function([MAT], T.sqrt(eps+squared_euclidean_distances))
def pdist_euclidean(mat):
return f_euclidean(mat)
And I'm adding it before performing sqrt
. I'm getting less NaN
s, but I'm still getting them. What is the proper solution to the problem? I've also noticed that if MAT
is T.dmatrix()
there are no NaN
回答1:
There are two likely sources of NaNs when computing Euclidean distances.
Floating point representation approximation issues causing negative distances when it's really just zero. The square root of a negative number is undefined (assuming you're not interested in the complex solution).
Imagine
MAT
has the value[[ 1.62434536 -0.61175641 -0.52817175 -1.07296862 0.86540763] [-2.3015387 1.74481176 -0.7612069 0.3190391 -0.24937038] [ 1.46210794 -2.06014071 -0.3224172 -0.38405435 1.13376944] [-1.09989127 -0.17242821 -0.87785842 0.04221375 0.58281521]]
Now, if we break down the computation we see that
(MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0]))
has value[[ 10.3838024 -9.92394296 10.39763039 -1.51676099] [ -9.92394296 18.16971188 -14.23897281 5.53390084] [ 10.39763039 -14.23897281 15.83764622 -0.65066204] [ -1.51676099 5.53390084 -0.65066204 4.70316652]]
and
2 * MAT.dot(MAT.T)
has value[[ 10.3838024 14.27675714 13.11072431 7.54348446] [ 14.27675714 18.16971188 17.00367905 11.4364392 ] [ 13.11072431 17.00367905 15.83764622 10.27040637] [ 7.54348446 11.4364392 10.27040637 4.70316652]]
The diagonal of these two values should be equal (the distance between a vector and itself is zero) and from this textual representation it looks like that is true, but in fact they are slightly different -- the differences are too small to show up when we print the floating point values like this
This becomes apparent when we print the value of the full expression (the second of the matrices above subtracted from the first)
[[ 0.00000000e+00 2.42007001e+01 2.71309392e+00 9.06024545e+00] [ 2.42007001e+01 -7.10542736e-15 3.12426519e+01 5.90253836e+00] [ 2.71309392e+00 3.12426519e+01 0.00000000e+00 1.09210684e+01] [ 9.06024545e+00 5.90253836e+00 1.09210684e+01 0.00000000e+00]]
The diagonal is almost composed of zeros but the item in the second row, second column is now a very small negative value. When you then compute the square root of all these values you get
NaN
in that position because the square root of a negative number is undefined (for real numbers).[[ 0. 4.91942071 1.64714721 3.01002416] [ 4.91942071 nan 5.58951267 2.42951402] [ 1.64714721 5.58951267 0. 3.30470398] [ 3.01002416 2.42951402 3.30470398 0. ]]
Computing the gradient of a Euclidean distance expression with respect to a variable inside the input to the function. This can happen not only if a negative number of generated due to floating point approximations, as above, but also if any of the inputs are zero length.
If
y = sqrt(x)
thendy/dx = 1/(2 * sqrt(x))
. So ifx=0
or, for your purposes, ifsquared_euclidean_distances=0
then the gradient will beNaN
because2 * sqrt(0) = 0
and dividing by zero is undefined.
The solution to the first problem can be achieved by ensuring squared distances are never negative by forcing them to be no less than zero:
T.sqrt(T.maximum(squared_euclidean_distances, 0.))
To solve both problems (if you need gradients) then you need to make sure the squared distances are never negative or zero, so bound with a small positive epsilon:
T.sqrt(T.maximum(squared_euclidean_distances, eps))
The first solution makes sense since the problem only arises from approximate representations. The second is a bit more questionable because the true distance is zero so, in a sense, the gradient should be undefined. Your specific use case may yield some alternative solution that is maintains the semantics without an artificial bound (e.g. by ensuring that gradients are never computed/used for zero-length vectors). But NaN
values can be pernicious: they can spread like weeds.
回答2:
Just checking
In squared_euclidian_distances
you're adding a column, a row, and a matrix. Are you sure this is what you want?
More precisely, if MAT
is of shape (n, p), you're adding matrices of shapes (n, 1), (1, n) and (n, n).
Theano seems to silently repeat the rows (resp. the columns) of each one-dimensional member to match the number of rows and columns of the dot product.
If this is what you want
In reshape, you should probably specify ndim=2
according to basic tensor functionality : reshape.
If the shape is a Variable argument, then you might need to use the optional ndim parameter to declare how many elements the shape has, and therefore how many dimensions the reshaped Variable will have.
Also, it seems that squared_euclidean_distances
should always be positive, unless imprecision errors in the difference change zero values into small negative values. If this is true, and if negative values are responsible for the NaNs you're seeing, you could indeed get rid of them without corrupting your result by surrounding squared_euclidean_distances
with abs(...)
.
来源:https://stackoverflow.com/questions/31919818/theano-sqrt-returning-nan-values