Why are my TensorFlow network weights and costs NaN when I use RELU activations?

后端 未结 3 1584
天命终不由人
天命终不由人 2021-02-01 10:16

I can\'t get TensorFlow RELU activations (neither tf.nn.relu nor tf.nn.relu6) working without NaN values for activations and weights killing my trainin

相关标签:
3条回答
  • 2021-02-01 10:56

    Following He et. al (as suggested in lejlot's comment), initializing the weights of the l-th layer to a zero-mean Gaussian distribution with standard deviation

    where nl is the flattened length of the the input vector or

    stddev=np.sqrt(2 / np.prod(input_tensor.get_shape().as_list()[1:]))
    

    results in weights that generally do not diverge.

    0 讨论(0)
  • 2021-02-01 11:11

    If you use a softmax classifier at the top of your network, try to make the initial weights of the layer just below the softmax very small (e.g. std=1e-4). This makes the initial distribution of outputs of the network very soft (high temperature), and helps ensure that the first few steps of your optimization are not too large and numerically unstable.

    0 讨论(0)
  • 2021-02-01 11:22

    Have you tried gradient clipping and/or a smaller learning rate?

    Basically, you will need to process your gradients before applying them, as follows (from tf docs, mostly):

    # Replace this with what follows
    # opt = tf.train.MomentumOptimizer(0.02, momentum=0.5).minimize(cross_entropy_loss)
    
    # Create an optimizer.
    opt = tf.train.MomentumOptimizer(learning_rate=0.001, momentum=0.5)
    
    # Compute the gradients for a list of variables.
    grads_and_vars = opt.compute_gradients(cross_entropy_loss, tf.trainable_variables())
    
    # grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
    # need to the 'gradient' part, for example cap them, etc.
    capped_grads_and_vars = [(tf.clip_by_value(gv[0], -5., 5.), gv[1]) for gv in grads_and_vars]
    
    # Ask the optimizer to apply the capped gradients.
    opt = opt.apply_gradients(capped_grads_and_vars)
    

    Also, the discussion in this question might help.

    0 讨论(0)
提交回复
热议问题