How Adagrad works in Keras? What does self.weights mean in Keras Optimizer?

狂风中的少年 提交于 2019-12-21 05:16:28

问题


For example, the implementation of Keras' Adagrad has been:

class Adagrad(Optimizer):
"""Adagrad optimizer.
It is recommended to leave the parameters of this optimizer
at their default values.
# Arguments
    lr: float >= 0. Learning rate.
    epsilon: float >= 0.
    decay: float >= 0. Learning rate decay over each update.
# References
    - [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
"""

def __init__(self, lr=0.01, epsilon=1e-8, decay=0., **kwargs):
    super(Adagrad, self).__init__(**kwargs)
    self.lr = K.variable(lr)
    self.epsilon = epsilon
    self.decay = K.variable(decay)
    self.initial_decay = decay
    self.iterations = K.variable(0.)

def get_updates(self, params, constraints, loss):
    grads = self.get_gradients(loss, params)
    shapes = [K.get_variable_shape(p) for p in params]
    accumulators = [K.zeros(shape) for shape in shapes]
    self.weights = accumulators
    self.updates = []

    lr = self.lr
    if self.initial_decay > 0:
        lr *= (1. / (1. + self.decay * self.iterations))
        self.updates.append(K.update_add(self.iterations, 1))

    for p, g, a in zip(params, grads, accumulators):
        new_a = a + K.square(g)  # update accumulator
        self.updates.append(K.update(a, new_a))
        new_p = p - lr * g / (K.sqrt(new_a) + self.epsilon)
        # apply constraints
        if p in constraints:
            c = constraints[p]
            new_p = c(new_p)
        self.updates.append(K.update(p, new_p))
    return self.updates

And the Function 'get_update()' seems one step update. However should the accumulators be stored the history information? Why it has been initialized to zeros at each step? How it can be an accumulator through the whole training process?

What does this line do?

self.weights = accumulators

It seems self.weights is never been called anymore.


回答1:


You are correct.. for all optimizers in Keras get_updates() implements the tensor logic for one step of updates. This function is called once for each model.fit() from _make_train_function() here, which is used to create the tensor function by passing the update rule as update= here. This update rule is used iteration to iteration to update the model parameters and other parameters.

self.weights of an optimizer class is its internal parameters. This is not used for training. It just functions to keep the state of the optimizer (list of pointers to the param/accumulators tensor) and when model.save is called they are also saved by calling get_weights() here and is loaded back when model.load is called by set_weights() here



来源:https://stackoverflow.com/questions/41787873/how-adagrad-works-in-keras-what-does-self-weights-mean-in-keras-optimizer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!