NaN loss when training regression network

后端 未结 17 2244
渐次进展
渐次进展 2020-11-29 16:28

I have a data matrix in \"one-hot encoding\" (all ones and zeros) with 260,000 rows and 35 columns. I am using Keras to train a simple neural network to predict a continuou

相关标签:
17条回答
  • 2020-11-29 16:30

    Was getting NaN for my classification network. Answering here as it might help someone.

    Had made a blunder -

    Number of classes in training labels was 5. i.e. from 0 to 4.

    In the last dense layer of classification had 4 nodes which means 4 classes which is the issue.

    Chaging the number of nodes in the last layer of network to 5 solved the issue for me.

    0 讨论(0)
  • 2020-11-29 16:31

    Try to check your data if there are NAN values. Removing NAN values solve the problem for me.

    0 讨论(0)
  • 2020-11-29 16:35

    I had the same problem with my keras CNN, as others I tried all above solutions: decrease learning rate, drop nullity from train data, normalize data, add dropout layer and ... but there couldn't solve nan problem, I tried change activation function in classifier (last) layer from sigmoid to softmax. It worked! try changing activation function of last layer to softmax!

    0 讨论(0)
  • 2020-11-29 16:38

    I was getting the loss as nan in the very first epoch, as soon as the training starts. Solution as simple as removing the nas from the input data worked for me (df.dropna())

    I hope this helps someone encountering similar problem

    0 讨论(0)
  • 2020-11-29 16:39

    I had a similar problem using keras. Loss turned into NAN after the second batch was input.

    I tried to:

    1. Use softmax as activation of output dense layer
    2. Drop nan in the input
    3. Normalize the input

    However, that didn't work. So, then I tried to:

    1. Decrease the learning rate

    Problem solved.

    0 讨论(0)
  • 2020-11-29 16:40

    Regression with neural networks is hard to get working because the output is unbounded, so you are especially prone to the exploding gradients problem (the likely cause of the nans).

    Historically, one key solution to exploding gradients was to reduce the learning rate, but with the advent of per-parameter adaptive learning rate algorithms like Adam, you no longer need to set a learning rate to get good performance. There is very little reason to use SGD with momentum anymore unless you're a neural network fiend and know how to tune the learning schedule.

    Here are some things you could potentially try:

    1. Normalize your outputs by quantile normalizing or z scoring. To be rigorous, compute this transformation on the training data, not on the entire dataset. For example, with quantile normalization, if an example is in the 60th percentile of the training set, it gets a value of 0.6. (You can also shift the quantile normalized values down by 0.5 so that the 0th percentile is -0.5 and the 100th percentile is +0.5).

    2. Add regularization, either by increasing the dropout rate or adding L1 and L2 penalties to the weights. L1 regularization is analogous to feature selection, and since you said that reducing the number of features to 5 gives good performance, L1 may also.

    3. If these still don't help, reduce the size of your network. This is not always the best idea since it can harm performance, but in your case you have a large number of first-layer neurons (1024) relative to input features (35) so it may help.

    4. Increase the batch size from 32 to 128. 128 is fairly standard and could potentially increase the stability of the optimization.

    0 讨论(0)
提交回复
热议问题