Keras gradient wrt something else

亡梦爱人 提交于 2021-01-28 11:24:36

问题


I am working to implement the method described in the article https://drive.google.com/file/d/1s-qs-ivo_fJD9BU_tM5RY8Hv-opK4Z-H/view . The final algorithm to use is here (it is on page 6):

  • d are units vector
  • xhi is a non-null number
  • D is the loss function (sparse cross-entropy in my case)

The idea is to do an adversarial training, by modifying the data in the direction where the network is the most sensible to small changes and training the network with the modified data but with the same label as the original data.

I am trying to implement this method in Keras with the MNIST dataset and a mini-batch of 100 data, but I can't get my head around with the computation of the gradient wrt r (first line of the 3rd step of the algorithm). I can't figure out how to compute it with Keras. Here is my code :

loss = losses.SparseCategoricalCrossentropy()

for epoch in range(5):
    print(f"Start of epoch {epoch}")
    for step, (xBatchTrain,yBatchTrain) in enumerate(trainDataset):
        #Generating the 100 unit vectors
        randomVectors = np.random.random(xBatchTrain.shape)
        U = randomVectors / np.linalg.norm(randomVectors,axis=1)[:, None]

        #Generating the r vectors
        Xi = 2
        R = tf.convert_to_tensor(U * Xi[:, None],dtype='float32')

        dataNoised = xBatchTrain + R

        with tf.GradientTape(persistent=True) as imTape:
            imTape.watch(R)
            #Geting the losses
            C = [loss(label,pred) for label, pred in zip(yBatchTrain,dumbModel(dataNoised,training=False))]

        #Getting the gradient wrt r for each images
        for l,r in zip(C,R):
            print(imTape.gradient(l,r))

The "print" line returns None for every sample. I should return me a vector of 784 values, each for one pixel?

(I apologize is part of the code is ugly, I am new to Keras, tf and deep learning)

[EDIT]

Here is a gist with the whole notebook: https://gist.github.com/DridriLaBastos/136a8e9d02b311e82fe22ec1c2850f78


回答1:


First, move dataNoised = xBatchTrain + R inside of with tf.GradientTape(persistent=True) as imTape: to recording the operation related to R

Second, instead of using:

for l,r in zip(C,R):
    print(imTape.gradient(l,r))

You should using imTape.gradient(C,R) to get sets of gradient, since zip will break the operation dependency in the tensor of R, print it out which will return something like following as same shape as xBatchTrain:

tf.Tensor(
[[-1.4924371e-06  1.0490652e-05 -1.8195267e-05 ...  1.5640746e-05
   3.3767541e-05 -2.0983218e-05]
 [ 2.3668531e-02  1.9133706e-02  3.1396169e-02 ... -1.4431887e-02
   5.3144591e-03  6.2225698e-03]
 [ 2.0492254e-03  7.1049971e-04  1.6121448e-03 ... -1.0579333e-03
   2.4968456e-03  8.3572773e-04]
 ...
 [-4.5572519e-03  6.2278998e-03  6.8322839e-03 ... -2.1966733e-03
   1.0822206e-03  1.8687058e-03]
 [-6.3691144e-03 -4.1699030e-02 -9.3158096e-02 ... -2.9496195e-02
  -7.0264392e-02 -3.2520775e-02]
 [-1.4666058e-02  2.0758331e-02  2.9009990e-02 ... -3.2206681e-02
   3.1550713e-02  4.9267178e-03]], shape=(100, 784), dtype=float32)


来源:https://stackoverflow.com/questions/65636637/keras-gradient-wrt-something-else

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!