How to use evaluation_loop with train_loop in tf-slim

前端 未结 3 2006
长情又很酷
长情又很酷 2021-02-04 10:57

I\'m trying to implement a few different models and train them on CIFAR-10, and I want to use TF-slim to do this. It looks like TF-slim has two main loops that are useful during

相关标签:
3条回答
  • 2021-02-04 11:44

    Thanks to @kmalakoff, the TensorFlow issue gave a brilliant way to the problem that how to validate or test model in tf.slim training. The main idea is overriding train_step_fn function:

    import …
    from tensorflow.contrib.slim.python.slim.learning import train_step
    
    ...
    
    accuracy_validation = ...
    accuracy_test = ...
    
    def train_step_fn(session, *args, **kwargs):
        total_loss, should_stop = train_step(session, *args, **kwargs)
    
        if train_step_fn.step % FLAGS.validation_every_n_step == 0:
            accuracy = session.run(train_step_fn.accuracy_validation)
            print('your validation info')
    
        if train_step_fn.step % FLAGS.test_every_n_step == 0:
            accuracy = session.run(train_step_fn.accuracy_test)
            print('your test info')
    
        train_step_fn.step += 1
        return [total_loss, should_stop] 
    
    train_step_fn.step = 0
    train_step_fn.accuracy_validation = accuracy_validation
    train_step_fn.accuracy_test = accuracy_test
    
    # run training.
    slim.learning.train(
        train_op,
        FLAGS.logs_dir,
        train_step_fn=train_step_fn,
        graph=graph,
        number_of_steps=FLAGS.max_steps)
    
    0 讨论(0)
  • 2021-02-04 11:51
    1. evaluation_loop is meant to be used (as you are currently using it) with a single directory. If you want to be more efficient, you could use slim.evaluation.evaluate_once and add the appropriate logic for swapping directories as you find appropriate.

    2. You can do this by overriding the slim.learning.train(..., train_step_fn) argument. This argument replaces the 'train_step' function with a custom function. Here, you can supply custom training function which returns the 'total_loss' and 'should_stop' values as you see fit.

    3. Your workflow looks great, this is probably the most common workflow for learning/eval using TF-Slim.

    0 讨论(0)
  • 2021-02-04 11:53

    Adding my 2-cent:

    I currently have this model for the evaluation_loop hogging up an entire GPU, but it's rarely being used

    Usually an evaluation model takes less GPU memory. You could prevent TF from hogging the whole GPU memory by setting the session config allow_growth to True. This way you can use the same GPU for both training and evaluation

    Example @ Training

    session_config = tf.ConfigProto()
    session_config.gpu_options.allow_growth = True
    
    slim.learning.train(train_tensor, 
                      logdir=train_log_dir,
                      local_init_op=tf.initialize_local_variables(),
                      save_summaries_secs=FLAGS.save_summaries_secs,
                      save_interval_secs=FLAGS.save_interval_secs,
                      session_config=session_config)
    

    Example @ validation

    session_config = tf.ConfigProto()
    session_config.gpu_options.allow_growth = True
    
    slim.evaluation.evaluation_loop(
          '',
          checkpoint_dir=train_log_dir,
          logdir=train_log_dir,
          num_evals=FLAGS.num_eval_batches,
          eval_op=names_to_updates.values(),
          summary_op=tf.merge_summary(summary_ops),
          eval_interval_secs=FLAGS.eval_interval_secs,
          session_config=session_config)
    
    0 讨论(0)
提交回复
热议问题