why dataset.output_shapes returns demension(none) after batching

前端 未结 2 1207
故里飘歌
故里飘歌 2021-01-20 00:19

I\'m using the Dataset API for input pipelines in TensorFlow (version: r1.2). I built my dataset and batched it with a batch size of 128. The dataset fed into the RNN.

相关标签:
2条回答
  • 2021-01-20 00:46

    This feature has been added with the drop_remainder parameter used like the following:

    batch_test_dataset = test_dataset.batch(FLAGS.batch_size, drop_remainder=True)
    

    From the docs:

    drop_remainder: (Optional.) A tf.bool scalar tf.Tensor, representing whether the last batch should be dropped in the case its has fewer than batch_size elements; the default behavior is not to drop the smaller batch.

    0 讨论(0)
  • 2021-01-20 00:55

    They hardcoded batch size in implementation and it always will return None (tf 1.3).

    def _padded_shape_to_batch_shape(s):
      return tensor_shape.vector(None).concatenate(
          tensor_util.constant_value_as_shape(s))
    

    In this way, they can batch all elements (e.g. dataset_size=14, batch_size=5, last_batch_size=4).

    You can use dataset.filter and dataset.map to fix this issue

    d = contrib.data.Dataset.from_tensor_slices([[5] for x in range(14)])
    batch_size = 5
    d = d.batch(batch_size)
    d = d.filter(lambda e: tf.equal(tf.shape(e)[0], batch_size))
    def batch_reshape(e):
        return  tf.reshape(e, [args.batch_size] + [s if s is not None else -1 for s in e.shape[1:].as_list()])
    d = d.map(batch_reshape)
    r = d.make_one_shot_iterator().get_next()
    print('dataset_output_shape = %s' % r.shape)
    with tf.Session() as sess:
        while True:
            print(sess.run(r))
    

    Output

    dataset_output_shape = (5, 1)

    [[5][5][5][5][5]]

    [[5][5][5][5][5]]

    OutOfRangeError

    0 讨论(0)
提交回复
热议问题