I am new to tensorflow and want to implement early stopping in tf.estimator.DNNRegressor
with available training hooksTraining Hooks for the MNIST dataset. The ear
Here is a EarlyStoppingHook
sample implementation:
import numpy as np
import tensorflow as tf
import logging
from tensorflow.python.training import session_run_hook
class EarlyStoppingHook(session_run_hook.SessionRunHook):
"""Hook that requests stop at a specified step."""
def __init__(self, monitor='val_loss', min_delta=0, patience=0,
mode='auto'):
"""
"""
self.monitor = monitor
self.patience = patience
self.min_delta = min_delta
self.wait = 0
if mode not in ['auto', 'min', 'max']:
logging.warning('EarlyStopping mode %s is unknown, '
'fallback to auto mode.', mode, RuntimeWarning)
mode = 'auto'
if mode == 'min':
self.monitor_op = np.less
elif mode == 'max':
self.monitor_op = np.greater
else:
if 'acc' in self.monitor:
self.monitor_op = np.greater
else:
self.monitor_op = np.less
if self.monitor_op == np.greater:
self.min_delta *= 1
else:
self.min_delta *= -1
self.best = np.Inf if self.monitor_op == np.less else -np.Inf
def begin(self):
# Convert names to tensors if given
graph = tf.get_default_graph()
self.monitor = graph.as_graph_element(self.monitor)
if isinstance(self.monitor, tf.Operation):
self.monitor = self.monitor.outputs[0]
def before_run(self, run_context): # pylint: disable=unused-argument
return session_run_hook.SessionRunArgs(self.monitor)
def after_run(self, run_context, run_values):
current = run_values.results
if self.monitor_op(current - self.min_delta, self.best):
self.best = current
self.wait = 0
else:
self.wait += 1
if self.wait >= self.patience:
run_context.request_stop()
This implementation is based on Keras implementation.
To use it with CNN MNIST example create hook and pass it to train
.
early_stopping_hook = EarlyStoppingHook(monitor='sparse_softmax_cross_entropy_loss/value', patience=10)
mnist_classifier.train(
input_fn=train_input_fn,
steps=20000,
hooks=[logging_hook, early_stopping_hook])
Here sparse_softmax_cross_entropy_loss/value
is the name of the loss op in that example.
EDIT 1:
It looks like there is no "official" way of finding loss node when using estimators (or I can't find it).
For the DNNRegressor
this node has name dnn/head/weighted_loss/Sum
.
Here is how to find it in the graph:
Start tensorboard in model directory. In my case I didn't set any directory so estimator used temporary directory and printed this line:
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpInj8SC
Start tensorboard:
tensorboard --logdir /tmp/tmpInj8SC
Open it in browser and navigate to GRAPHS tab.
Find loss in the graph. Expand blocks in the sequence: dnn
→ head
→ weighted_loss
and click on the Sum
node (note that there is summary node named loss
connected to it).
Name shown in the info "window" to the right is the name of the selected node, that need to be passed to monitor
argument pf EarlyStoppingHook
.
Loss node of the DNNClassifier
has the same name by default. Both DNNClassifier
and DNNRegressor
have optional argument loss_reduction
that influences loss node name and behavior (defaults to losses.Reduction.SUM
).
EDIT 2:
There is a way of finding loss without looking at the graph.
You can use GraphKeys.LOSSES
collection to get the loss. But this way will work only after training started. So you can use it only in a hook.
For example you can remove monitor
argument from the EarlyStoppingHook
class and change its begin
function to always use the first loss in the collection:
self.monitor = tf.get_default_graph().get_collection(tf.GraphKeys.LOSSES)[0]
You also probably need to check that there is a loss in the collection.