Why is the result of the code offered by Deep Learning with TensorFlow different from the snapshot in its book

问题

In the first chapter of Deep Learning with TensorFlow, it gives an example on how to build a simple neural network for recognizing handwritten digits. According to its description, the code bundle for the book can be found at GitHub.

From the context, I think section Running a simple TensorFlow 2.0 net and establishing a baseline uses the code same with Deep-Learning-with-TensorFlow-2-and-Keras/mnist_V1.py. When I run this example code, it gives me the following output:

The snapshot from the book is:

The result in my screenshot is 375/375 while in the snapshot of the book is 48000/48000. Also, I miss the line Train on 48000 samples, validate on 12000 samples. Why this happens? How can I get the same result with the snapshot from the book?

From my output, I think the size of mine loaded datasets is the same with what it describes in the code:

# loading MNIST dataset
# verify
# the split between train and test is 60,000, and 10,000 respectly 
# one-hot is automatically applied
mnist = keras.datasets.mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

My package versions:

$ python --version
Python 3.6.8
$ python3 -c 'import tensorflow as tf; print(tf.__version__)'
2.3.1
$ python3 -c 'import tensorflow as tf; print(tf.keras.__version__)'
2.4.0

I tried to find answer from the source code. The fit method is defined at training.py. In this method, it instantiates a CallbackList object which then creates ProgbarLogger.

# training.py class Model method fit

      # Container that configures and calls `tf.keras.Callback`s.
      if not isinstance(callbacks, callbacks_module.CallbackList):
        callbacks = callbacks_module.CallbackList(
            callbacks,
            add_history=True,
            add_progbar=verbose != 0,
            model=self,
            verbose=verbose,
            epochs=epochs,
            steps=data_handler.inferred_steps)

# callbacks.py class ProgbarLogger

  def on_epoch_begin(self, epoch, logs=None):
    self._reset_progbar()
    if self.verbose and self.epochs > 1:
      print('Epoch %d/%d' % (epoch + 1, self.epochs))

  def on_train_batch_end(self, batch, logs=None):
    self._batch_update_progbar(batch, logs)

  def _batch_update_progbar(self, batch, logs=None):
    # ...

    if self.verbose == 1:
      # Only block async when verbose = 1.
      logs = tf_utils.to_numpy_or_python_type(logs)
      self.progbar.update(self.seen, list(logs.items()), finalize=False)

ProgbarLogger then calls ProgBar update method to update the progress bar.

# generic_utils.py class ProgBar method update

    if self.verbose == 1:
      # ...

      if self.target is not None:
        numdigits = int(np.log10(self.target)) + 1
        bar = ('%' + str(numdigits) + 'd/%d [') % (current, self.target)

375 is the value of self.target. I then find out the value of self.target is passed from the steps parameter of CallbackList object. In the first code snippet, you can see steps=data_handler.inferred_steps. The property inferred_steps is defined at data_adapter.py.

  @property
  def inferred_steps(self):
    """The inferred steps per epoch of the created `Dataset`.
    This will be `None` in the case where:
    (1) A `Dataset` of unknown cardinality was passed to the `DataHandler`, and
    (2) `steps_per_epoch` was not provided, and
    (3) The first epoch of iteration has not yet completed.
    Returns:
      The inferred steps per epoch of the created `Dataset`.
    """
    return self._inferred_steps

I got lost on how self._inferred_steps is calculated.

I think the miss line is related with training_arrays_v1.py. But I don't know what does V1 mean.

def _print_train_info(num_samples_or_steps, val_samples_or_steps, is_dataset):
  increment = 'steps' if is_dataset else 'samples'
  msg = 'Train on {0} {increment}'.format(
      num_samples_or_steps, increment=increment)
  if val_samples_or_steps:
    msg += ', validate on {0} {increment}'.format(
        val_samples_or_steps, increment=increment)
  print(msg)

回答1:

Good question.

Let us break it into smaller parts.

You train on 48.000 samples and test on 12.000. However, your code display 375 instead of 48.000.

If you look at the batch size, its value is 128.

A quick division ---> 48.000 // 128 = 375

Your code is correct, which is good.

The problem comes from the fact that, in older version of Keras and TensorFlow, the entire samples per step were shown (48.000), regardless of the batch_size used. In this example, the progress bar is updated like: 0, 128, 256 .... until 48.000.

Now, in more recent versions, the steps_per_epoch and validation_steps parameters are equal to the number of samples (say 48.000) divided to the batch_size dimension (say 128), hence the 375.

Both displays are correct, it is just a matter of different progress bars, I personally agree and prefer the latter, since, if you have a batch_size of 128, I would rather agree with the logic of seeing 1, 2, 3 ... 375.

Update for further clarification:

Here you have a detailed description of the model.fit() arguments.

https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

steps_per_epoch Integer or None. Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch.

validation_steps Only relevant if validation_data is provided and is a tf.data dataset. Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch.

来源：https://stackoverflow.com/questions/65462757/why-is-the-result-of-the-code-offered-by-deep-learning-with-tensorflow-different

标签

python

tensorflow

keras

tensorflow2.0

mnist