I want to use MomentumOptimizer
in Tensorflow. However, since this optimizer uses some internal variable, attempting to use it without initializing this variabl
You can filter variables by name and only initialize those. IE
momentum_initializers = [var.initializer for var in tf.global_variables() if 'Momentum' in var.name]
sess.run(momentum_initializers)
There is a more straightforward way:
optimizer = tf.train.AdamOptimizer()
session.run(tf.variables_initializer(optimizer.variables()))
Both current answers kinda work by filtering the variable name using the 'Momentum' string. But that is very brittle on two sides:
Fortunately, tensorflow's abstract Optimizer
class has a mechanism for that, these extra optimizer variables are called "slots", and you can get all slot names of an optimizer using the get_slot_names() method:
opt = tf.train.MomentumOptimizer(...)
print(opt.get_slot_names())
# prints ['momentum']
And you can get the variable corresponding to the slot for a specific (trainable) variable v
using the get_slot(var, slot_name) method:
opt.get_slot(some_var, 'momentum')
Putting all this together, you can create an op that initializes the optimizer's state as follows:
var_list = # list of vars to optimize, e.g.
# tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
opt = tf.train.MomentumOptimizer(0.1, 0.95)
step_op = opt.minimize(loss, var_list=var_list)
reset_opt_op = tf.variables_initializer([opt.get_slot(var, name) for name in opt.get_slot_names() for var in var_list])
This will really only reset the correct variables, and be robust across optimizers.
Except for one unfortunate caveat: AdamOptimizer
. That one also keeps a counter for how often it's been called. That means you should really think hard about what you're doing here anyways, but for completeness' sake, you can get its extra states as opt._get_beta_accumulators()
. The returned list should be added to the list in the above reset_opt_op
line.
tf.variables_initializer seems to be the preferred way to initialize a specific set of variables:
var_list = [var for var in tf.global_variables() if 'Momentum' in var.name]
var_list_init = tf.variables_initializer(var_list)
...
sess = tf.Session()
sess.run(var_list_init)
To fix the None problem just do:
self.opt_vars = [opt.get_slot(var, name) for name in opt.get_slot_names()
for var in self.vars_to_train
if opt.get_slot(var, name) is not None]
Building off of LucasB's answer about AdamOptimizer
, this function takes an AdamOptimizer
instance adam_opt
that has its Variables
created (one of these two called: adam_opt.minimize(loss, var_list=var_list)
or adam_opt.apply_gradients(zip(grads, var_list))
. The function creates an Op
that, when called, re-initializes the optimizer's variables for the passed variable, as well as the global counting state.
def adam_variables_initializer(adam_opt, var_list):
adam_vars = [adam_opt.get_slot(var, name)
for name in adam_opt.get_slot_names()
for var in var_list if var is not None]
adam_vars.extend(list(adam_opt._get_beta_accumulators()))
return tf.variables_initializer(adam_vars)
e.g.:
opt = tf.train.AdamOptimizer(learning_rate=1e-4)
fit_op = opt.minimize(loss, var_list=var_list)
reset_opt_vars = adam_variables_initializer(opt, var_list)