I\'d like print out the learning rate for each training step of my nn.
I know that Adam has an adaptive learning rate, but is there a way i can see this (for visuali
All the optimizers have a private variable that holds the value of a learning rate.
In adagrad and gradient descent it is called self._learning_rate
. In adam it is self._lr
.
So you will just need to print sess.run(optimizer._lr)
to get this value. Sess.run is needed because they are tensors.
Sung Kim suggestion worked for me, my exact steps were:
lr = 0.1
step_rate = 1000
decay = 0.95
global_step = tf.Variable(0, trainable=False)
increment_global_step = tf.assign(global_step, global_step + 1)
learning_rate = tf.train.exponential_decay(lr, global_step, step_rate, decay, staircase=True)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=0.01)
trainer = optimizer.minimize(loss_function)
# Some code here
print('Learning rate: %f' % (sess.run(trainer ._lr)))
I think the easiest thing you can do is subclass the optimizer.
It has several methods, that I guess get dispatched to based on variable type. Regular Dense variables seem to go through _apply_dense
. This solution won't work for sparse or other things.
If you look at the implementation you can see that it's storing the m
and t
EMAs in these "slots". So something like this seems do it:
class MyAdam(tf.train.AdamOptimizer):
def _apply_dense(self, grad, var):
m = self.get_slot(var, "m")
v = self.get_slot(var, "v")
m_hat = m/(1-self._beta1_power)
v_hat = v/(1-self._beta2_power)
step = m_hat/(v_hat**0.5 + self._epsilon_t)
# Use a histogram summary to monitor it during training.
tf.summary.histogram("hist", step)
return super(MyAdam,self)._apply_dense(grad, var)
step
here will be in the interval [-1,1], that's what gets multiplied by the learning rate, to determines the actual step applied to the parameters.
There's often no node in the graph for it because there is one big training_ops.apply_adam
that does everything.
Here I'm just creating a histogram summary from it. But you could stick it in a dictionary attached to the object and read it later or do whatever you want with it.
Droping that into mnist_deep.py, and adding some summaries to the training loop:
all_summaries = tf.summary.merge_all()
file_writer = tf.summary.FileWriter("/tmp/Adam")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(20000):
batch = mnist.train.next_batch(50)
if i % 100 == 0:
train_accuracy,summaries = sess.run(
[accuracy,all_summaries],
feed_dict={x: batch[0], y_: batch[1],
keep_prob: 1.0})
file_writer.add_summary(summaries, i)
print('step %d, training accuracy %g' % (i, train_accuracy))
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
Produces the following figure in TensorBoard:
In Tensorflow 2
:
optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.1) # or any other optimizer
print(optimizer.learning_rate.numpy()) # or print(optimizer.lr.numpy())
Note: This gives you the base learning rate. Refer to this answer for more details on adaptive learning rates.
In TensorFlow sources current lr for Adam optimizer calculates like:
lr = (lr_t * math_ops.sqrt(1 - beta2_power) / (1 - beta1_power))
So, try it:
current_lr = (optimizer._lr_t * tf.sqrt(1 -
optimizer._beta2_power) / (1 - optimizer._beta1_power))
eval_current_lr = sess.run(current_lr)