How to use evaluation_loop with train_loop in tf-slim

元气小坏坏 提交于 2019-12-02 23:37:35
  1. evaluation_loop is meant to be used (as you are currently using it) with a single directory. If you want to be more efficient, you could use slim.evaluation.evaluate_once and add the appropriate logic for swapping directories as you find appropriate.

  2. You can do this by overriding the slim.learning.train(..., train_step_fn) argument. This argument replaces the 'train_step' function with a custom function. Here, you can supply custom training function which returns the 'total_loss' and 'should_stop' values as you see fit.

  3. Your workflow looks great, this is probably the most common workflow for learning/eval using TF-Slim.

Thanks to @kmalakoff, the TensorFlow issue gave a brilliant way to the problem that how to validate or test model in tf.slim training. The main idea is overriding train_step_fn function:

import …
from tensorflow.contrib.slim.python.slim.learning import train_step

...

accuracy_validation = ...
accuracy_test = ...

def train_step_fn(session, *args, **kwargs):
    total_loss, should_stop = train_step(session, *args, **kwargs)

    if train_step_fn.step % FLAGS.validation_every_n_step == 0:
        accuracy = session.run(train_step_fn.accuracy_validation)
        print('your validation info')

    if train_step_fn.step % FLAGS.test_every_n_step == 0:
        accuracy = session.run(train_step_fn.accuracy_test)
        print('your test info')

    train_step_fn.step += 1
    return [total_loss, should_stop] 

train_step_fn.step = 0
train_step_fn.accuracy_validation = accuracy_validation
train_step_fn.accuracy_test = accuracy_test

# run training.
slim.learning.train(
    train_op,
    FLAGS.logs_dir,
    train_step_fn=train_step_fn,
    graph=graph,
    number_of_steps=FLAGS.max_steps)

Adding my 2-cent:

I currently have this model for the evaluation_loop hogging up an entire GPU, but it's rarely being used

Usually an evaluation model takes less GPU memory. You could prevent TF from hogging the whole GPU memory by setting the session config allow_growth to True. This way you can use the same GPU for both training and evaluation

Example @ Training

session_config = tf.ConfigProto()
session_config.gpu_options.allow_growth = True

slim.learning.train(train_tensor, 
                  logdir=train_log_dir,
                  local_init_op=tf.initialize_local_variables(),
                  save_summaries_secs=FLAGS.save_summaries_secs,
                  save_interval_secs=FLAGS.save_interval_secs,
                  session_config=session_config)

Example @ validation

session_config = tf.ConfigProto()
session_config.gpu_options.allow_growth = True

slim.evaluation.evaluation_loop(
      '',
      checkpoint_dir=train_log_dir,
      logdir=train_log_dir,
      num_evals=FLAGS.num_eval_batches,
      eval_op=names_to_updates.values(),
      summary_op=tf.merge_summary(summary_ops),
      eval_interval_secs=FLAGS.eval_interval_secs,
      session_config=session_config)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!