I\'m trying to implement a few different models and train them on CIFAR-10, and I want to use TF-slim to do this. It looks like TF-slim has two main loops that are useful during
Thanks to @kmalakoff, the TensorFlow issue gave a brilliant way to the problem that how to validate or test model in tf.slim
training. The main idea is overriding train_step_fn
function:
import …
from tensorflow.contrib.slim.python.slim.learning import train_step
...
accuracy_validation = ...
accuracy_test = ...
def train_step_fn(session, *args, **kwargs):
total_loss, should_stop = train_step(session, *args, **kwargs)
if train_step_fn.step % FLAGS.validation_every_n_step == 0:
accuracy = session.run(train_step_fn.accuracy_validation)
print('your validation info')
if train_step_fn.step % FLAGS.test_every_n_step == 0:
accuracy = session.run(train_step_fn.accuracy_test)
print('your test info')
train_step_fn.step += 1
return [total_loss, should_stop]
train_step_fn.step = 0
train_step_fn.accuracy_validation = accuracy_validation
train_step_fn.accuracy_test = accuracy_test
# run training.
slim.learning.train(
train_op,
FLAGS.logs_dir,
train_step_fn=train_step_fn,
graph=graph,
number_of_steps=FLAGS.max_steps)
evaluation_loop is meant to be used (as you are currently using it) with a single directory. If you want to be more efficient, you could use slim.evaluation.evaluate_once and add the appropriate logic for swapping directories as you find appropriate.
You can do this by overriding the slim.learning.train(..., train_step_fn) argument. This argument replaces the 'train_step' function with a custom function. Here, you can supply custom training function which returns the 'total_loss' and 'should_stop' values as you see fit.
Your workflow looks great, this is probably the most common workflow for learning/eval using TF-Slim.
Adding my 2-cent:
I currently have this model for the evaluation_loop hogging up an entire GPU, but it's rarely being used
Usually an evaluation model takes less GPU memory. You could prevent TF from hogging the whole GPU memory by setting the session config allow_growth
to True
. This way you can use the same GPU for both training and evaluation
Example @ Training
session_config = tf.ConfigProto()
session_config.gpu_options.allow_growth = True
slim.learning.train(train_tensor,
logdir=train_log_dir,
local_init_op=tf.initialize_local_variables(),
save_summaries_secs=FLAGS.save_summaries_secs,
save_interval_secs=FLAGS.save_interval_secs,
session_config=session_config)
Example @ validation
session_config = tf.ConfigProto()
session_config.gpu_options.allow_growth = True
slim.evaluation.evaluation_loop(
'',
checkpoint_dir=train_log_dir,
logdir=train_log_dir,
num_evals=FLAGS.num_eval_batches,
eval_op=names_to_updates.values(),
summary_op=tf.merge_summary(summary_ops),
eval_interval_secs=FLAGS.eval_interval_secs,
session_config=session_config)