Perhaps too general a question, but can anyone explain what would cause a Convolutional Neural Network to diverge?
Specifics:
I am using Tensorflow\'s iris_tra
Regularization can help. For a classifier, there is a good case for activity regularization, whether it is binary or a multi-class classifier. For a regressor, kernel regularization might be more appropriate.
If you're training for cross entropy, you want to add a small number like 1e-8 to your output probability.
Because log(0) is negative infinity, when your model trained enough the output distribution will be very skewed, for instance say I'm doing a 4 class output, in the beginning my probability looks like
0.25 0.25 0.25 0.25
but toward the end the probability will probably look like
1.0 0 0 0
And you take a cross entropy of this distribution everything will explode. The fix is to artifitially add a small number to all the terms to prevent this.
If using integers as targets, makes sure they aren't symmetrical at 0.
I.e., don't use classes -1, 0, 1. Use instead 0, 1, 2.
The reason for nan
, inf
or -inf
often comes from the fact that division by 0.0
in TensorFlow doesn't result in a division by zero exception. It could result in a nan
, inf
or -inf
"value". In your training data you might have 0.0
and thus in your loss function it could happen that you perform a division by 0.0
.
a = tf.constant([2., 0., -2.])
b = tf.constant([0., 0., 0.])
c = tf.constant([1., 1., 1.])
print((a / b) + c)
Output is the following tensor:
tf.Tensor([ inf nan -inf], shape=(3,), dtype=float32)
Adding a small eplison
(e.g., 1e-5
) often does the trick. Additionally, since TensorFlow 2 the opteration tf.math.division_no_nan
is defined.
In my case I got NAN when setting distant integer LABELs. ie:
So, not use a very distant Label.
EDIT You can see the effect in the following simple code:
from keras.models import Sequential
from keras.layers import Dense, Activation
import numpy as np
X=np.random.random(size=(20,5))
y=np.random.randint(0,high=5, size=(20,1))
model = Sequential([
Dense(10, input_dim=X.shape[1]),
Activation('relu'),
Dense(5),
Activation('softmax')
])
model.compile(optimizer = "Adam", loss = "sparse_categorical_crossentropy", metrics = ["accuracy"] )
print('fit model with labels in range 0..5')
history = model.fit(X, y, epochs= 5 )
X = np.vstack( (X, np.random.random(size=(1,5))))
y = np.vstack( ( y, [[8000]]))
print('fit model with labels in range 0..5 plus 8000')
history = model.fit(X, y, epochs= 5 )
The result shows the NANs after adding the label 8000:
fit model with labels in range 0..5
Epoch 1/5
20/20 [==============================] - 0s 25ms/step - loss: 1.8345 - acc: 0.1500
Epoch 2/5
20/20 [==============================] - 0s 150us/step - loss: 1.8312 - acc: 0.1500
Epoch 3/5
20/20 [==============================] - 0s 151us/step - loss: 1.8273 - acc: 0.1500
Epoch 4/5
20/20 [==============================] - 0s 198us/step - loss: 1.8233 - acc: 0.1500
Epoch 5/5
20/20 [==============================] - 0s 151us/step - loss: 1.8192 - acc: 0.1500
fit model with labels in range 0..5 plus 8000
Epoch 1/5
21/21 [==============================] - 0s 142us/step - loss: nan - acc: 0.1429
Epoch 2/5
21/21 [==============================] - 0s 238us/step - loss: nan - acc: 0.2381
Epoch 3/5
21/21 [==============================] - 0s 191us/step - loss: nan - acc: 0.2381
Epoch 4/5
21/21 [==============================] - 0s 191us/step - loss: nan - acc: 0.2381
Epoch 5/5
21/21 [==============================] - 0s 188us/step - loss: nan - acc: 0.2381
I'd like to plug in some (shallow) reasons I have experienced as follows:
Hope that helps.