NaN loss when training regression network

后端 未结 17 2246
渐次进展
渐次进展 2020-11-29 16:28

I have a data matrix in \"one-hot encoding\" (all ones and zeros) with 260,000 rows and 35 columns. I am using Keras to train a simple neural network to predict a continuou

相关标签:
17条回答
  • 2020-11-29 16:48

    I faced a very similar problem, and this is how I got it to run.

    The first thing you can try is changing your activation to LeakyReLU instead of using Relu or Tanh. The reason is that often, many of the nodes within your layers have an activation of zero, and backpropogation doesn't update the weights for these nodes because their gradient is also zero. This is also called the 'dying ReLU' problem (you can read more about it here: https://datascience.stackexchange.com/questions/5706/what-is-the-dying-relu-problem-in-neural-networks).

    To do this, you can import the LeakyReLU activation using:

    from keras.layers.advanced_activations import LeakyReLU
    

    and incorporate it within your layers like this:

    model.add(Dense(800,input_shape=(num_inputs,)))
    model.add(LeakyReLU(alpha=0.1))
    

    Additionally, it is possible that the output feature (the continuous variable you are trying to predict) is an imbalanced data set and has too many 0s. One way to fix this issue is to use smoothing. You can do this by adding 1 to the numerator of all your values in this column and dividing each of the values in this column by 1/(average of all the values in this column)

    This essentially shifts all the values from 0 to a value greater than 0 (which may still be very small). This prevents the curve from predicting 0s and minimizing the loss (eventually making it NaN). Smaller values are more greatly impacted than larger values, but on the whole, the average of the data set remains the same.

    0 讨论(0)
  • 2020-11-29 16:51

    The answer by 1" is quite good. However, all of the fixes seems to fix the issue indirectly rather than directly. I would recommend using gradient clipping, which will clip any gradients that are above a certain value.

    In Keras you can use clipnorm=1 (see https://keras.io/optimizers/) to simply clip all gradients with a norm above 1.

    0 讨论(0)
  • 2020-11-29 16:51

    I had the same problem with my RNN with keras LSTM layers, so I tried each solution from above. I had already scaled my data (with sklearn.preprocessing.MinMaxScaler), there were no NaN values in my data after scaling. Solutions like using LeakyRelU or changing learning rate didn't help.

    So I decided to change the scaler from MinMaxScaler to StandardScaler, even though I had no NaN values and I found it odd but it worked!

    0 讨论(0)
  • 2020-11-29 16:52

    I faced the same problem before. I search and find this question and answers. All those tricks mentioned above are important for training a deep neural network. I tried them all, but still got NAN.

    I also find this question here. https://github.com/fchollet/keras/issues/2134. I cited the author's summary as follows:

    I wanted to point this out so that it's archived for others who may experience this problem in future. I was running into my loss function suddenly returning a nan after it go so far into the training process. I checked the relus, the optimizer, the loss function, my dropout in accordance with the relus, the size of my network and the shape of the network. I was still getting loss that eventually turned into a nan and I was getting quite fustrated.

    Then it dawned on me. I may have some bad input. It turns out, one of the images that I was handing to my CNN (and doing mean normalization on) was nothing but 0's. I wasn't checking for this case when I subtracted the mean and normalized by the std deviation and thus I ended up with an exemplar matrix which was nothing but nan's. Once I fixed my normalization function, my network now trains perfectly.

    I agree with the above viewpoint: the input is sensitive for your network. In my case, I use the log value of density estimation as an input. The absolute value could be very huge, which may result in NaN after several steps of gradients. I think the input check is necessary. First, you should make sure the input does not include -inf or inf, or some extremely large numbers in absolute value.

    0 讨论(0)
  • 2020-11-29 16:53

    I had the same problem, I was using Keras for a Multivariate regression problem. What I later realised was that some values in my dataset were nan and that led to a nan loss. I used the command:

    df=df.dropna()

    And it resolved my issue.

    0 讨论(0)
提交回复
热议问题