Why is Tensorflow's Gradient Tape returning None when trying to find the gradient of loss wrt input?

问题

I have a CNN model built in keras which uses an SVM in its last layer. I get the prediction of this SVM by putting in an input into the CNN model, extracting the relevant features and then putting those features into my SVM to get an output prediction. This entire process I have names predict_DNR_tensor in the code below. This works fine and I am able to get a correct prediction. I am now trying to get a gradient of squared hinge loss of this prediction from my SVM wrt to the original input, see code. However when using Gradient Tape here it doesnt seem to work and the function just returns None. When I use it with the outputted prediction of just the CNN model (without the SVM) it is fine and gives me a gradient. Why?

import tensorflow as tf
import tensorflow.keras.losses as losses


x = np.expand_dims(X_train[0,:,:,:],axis=0)
x = tf.convert_to_tensor(x)

with tf.GradientTape() as tape:
  tape.watch(x)

  ##
  y_pred = predict_DNR_tensor(x)/2 # dividing by 2 to normalise back into [0,1 range]
  y_pred = tf.convert_to_tensor(y_pred, dtype="float32")
  ##

  y_pred2 = CNN_model(x)

  y_true = np.expand_dims(y_train[0,:],axis=0)
  loss = losses.squared_hinge(y_true,y_pred)
  loss2 = losses.squared_hinge(y_true,y_pred2)

gradient = tape.gradient(loss,x)

the variables used are as such:

y_true = array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]], dtype=float32)  

y_pred = <tf.Tensor: id=84063, shape=(1, 10), dtype=float32, numpy=
array([[-0.51142603, -0.51385206, -0.5131374 , -0.52496594, -0.51574653,
         0.54295117, -0.5148362 , -0.51094234, -0.52781606, -0.53384954]],
      dtype=float32)>  

y_pred2 = <tf.Tensor: id=84105, shape=(1, 10), dtype=float32, numpy=
array([[9.1292924e-05, 6.4014189e-06, 1.2363887e-05, 2.6787011e-02,
        2.7567458e-07, 9.7225791e-01, 2.2164610e-04, 1.3467512e-06,
        5.6831568e-04, 5.3546366e-05]], dtype=float32)>


loss = <tf.Tensor: id=84125, shape=(1,), dtype=float32, numpy=array([0.22959474], dtype=float32)>

loss2 = <tf.Tensor: id=84384, shape=(1,), dtype=float32, numpy=array([0.9056972], dtype=float32)>

When i calculate the gradient using loss, I get returned None. When I calculate the gradient using loss2, i get an array of values as expected. The only difference between loss and loss2 are y_pred and y_pred2. y_pred2 as i understand it is just the output predictions of a cnn model built in keras. (Note: my loss is not really correct for this function, I was just curious to see if it would throw out a gradient if I used the output of this model.)

y_pred however, which I am actually interested in calls the outputs of an SVM used as the last layer of the cnn model. ie. it gets out the features of the cnn model for this input image and then puts those features into a seperate svm model to get these outputs.

y_pred and y_pred2 seem similar in their datatypes and shapes, regarldess of having different values. Why is y_pred then not able to get a gradient out? And is there a way to fix it?

来源：https://stackoverflow.com/questions/61367182/why-is-tensorflows-gradient-tape-returning-none-when-trying-to-find-the-gradien

标签

machine-learning

keras

scikit-learn

tensorflow2.0

gradienttape