How Adagrad works in Keras? What does self.weights mean in Keras Optimizer?

问题

For example, the implementation of Keras' Adagrad has been:

class Adagrad(Optimizer):
"""Adagrad optimizer.
It is recommended to leave the parameters of this optimizer
at their default values.
# Arguments
    lr: float >= 0. Learning rate.
    epsilon: float >= 0.
    decay: float >= 0. Learning rate decay over each update.
# References
    - [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
"""

def __init__(self, lr=0.01, epsilon=1e-8, decay=0., **kwargs):
    super(Adagrad, self).__init__(**kwargs)
    self.lr = K.variable(lr)
    self.epsilon = epsilon
    self.decay = K.variable(decay)
    self.initial_decay = decay
    self.iterations = K.variable(0.)

def get_updates(self, params, constraints, loss):
    grads = self.get_gradients(loss, params)
    shapes = [K.get_variable_shape(p) for p in params]
    accumulators = [K.zeros(shape) for shape in shapes]
    self.weights = accumulators
    self.updates = []

    lr = self.lr
    if self.initial_decay > 0:
        lr *= (1. / (1. + self.decay * self.iterations))
        self.updates.append(K.update_add(self.iterations, 1))

    for p, g, a in zip(params, grads, accumulators):
        new_a = a + K.square(g)  # update accumulator
        self.updates.append(K.update(a, new_a))
        new_p = p - lr * g / (K.sqrt(new_a) + self.epsilon)
        # apply constraints
        if p in constraints:
            c = constraints[p]
            new_p = c(new_p)
        self.updates.append(K.update(p, new_p))
    return self.updates

And the Function 'get_update()' seems one step update. However should the accumulators be stored the history information? Why it has been initialized to zeros at each step? How it can be an accumulator through the whole training process?

What does this line do?

self.weights = accumulators

It seems self.weights is never been called anymore.

回答1:

You are correct.. for all optimizers in Keras get_updates() implements the tensor logic for one step of updates. This function is called once for each model.fit() from _make_train_function() here, which is used to create the tensor function by passing the update rule as update= here. This update rule is used iteration to iteration to update the model parameters and other parameters.

self.weights of an optimizer class is its internal parameters. This is not used for training. It just functions to keep the state of the optimizer (list of pointers to the param/accumulators tensor) and when model.save is called they are also saved by calling get_weights() here and is loaded back when model.load is called by set_weights() here

来源：https://stackoverflow.com/questions/41787873/how-adagrad-works-in-keras-what-does-self-weights-mean-in-keras-optimizer

标签

python

machine-learning

tensorflow

theano

keras