What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?

问题

I have a dataset represented as a NumPy matrix of shape (num_features, num_examples) and I wish to convert it to TensorFlow type tf.Dataset.

I am struggling trying to understand the difference between these two methods: Dataset.from_tensors and Dataset.from_tensor_slices. What is the right one and why?

TensorFlow documentation (link) says that both method accept a nested structure of tensor although when using from_tensor_slices the tensor should have same size in the 0-th dimension.

回答1:

from_tensors combines the input and returns a dataset with a single element:

t = tf.constant([[1, 2], [3, 4]])
ds = tf.data.Dataset.from_tensors(t)   # [[1, 2], [3, 4]]

from_tensor_slices creates a dataset with a separate element for each row of the input tensor:

t = tf.constant([[1, 2], [3, 4]])
ds = tf.data.Dataset.from_tensor_slices(t)   # [1, 2], [3, 4]

回答2:

1) Main difference between the two is that nested elements in from_tensor_slices must have the same dimension in 0th rank:

# exception: ValueError: Dimensions 10 and 9 are not compatible
dataset1 = tf.data.Dataset.from_tensor_slices(
    (tf.random_uniform([10, 4]), tf.random_uniform([9])))
# OK
dataset2 = tf.data.Dataset.from_tensors(
    (tf.random_uniform([10, 4]), tf.random_uniform([9])))

2) The second difference, explained here, is when the input to a tf.Dataset is a list. For example:

dataset1 = tf.data.Dataset.from_tensor_slices(
    [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])

dataset2 = tf.data.Dataset.from_tensors(
    [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])

print(dataset1) # shapes: (2, 3)
print(dataset2) # shapes: (2, 2, 3)

In the above, from_tensors creates a 3D tensor while from_tensor_slices merge the input tensor. This can be handy if you have different sources of different image channels and want to concatenate them into a one RGB image tensor.

3) A mentioned in the previous answer, from_tensors convert the input tensor into one big tensor:

import tensorflow as tf

tf.enable_eager_execution()

dataset1 = tf.data.Dataset.from_tensor_slices(
    (tf.random_uniform([4, 2]), tf.random_uniform([4])))

dataset2 = tf.data.Dataset.from_tensors(
    (tf.random_uniform([4, 2]), tf.random_uniform([4])))

for i, item in enumerate(dataset1):
    print('element: ' + str(i + 1), item[0], item[1])

print(30*'-')

for i, item in enumerate(dataset2):
    print('element: ' + str(i + 1), item[0], item[1])

output:

element: 1 tf.Tensor(... shapes: ((2,), ()))
element: 2 tf.Tensor(... shapes: ((2,), ()))
element: 3 tf.Tensor(... shapes: ((2,), ()))
element: 4 tf.Tensor(... shapes: ((2,), ()))
-------------------------
element: 1 tf.Tensor(... shapes: ((4, 2), (4,)))

回答3:

Try this :

import tensorflow as tf  # 1.13.1
tf.enable_eager_execution()

t1 = tf.constant([[11, 22], [33, 44], [55, 66]])

print("\n=========     from_tensors     ===========")
ds = tf.data.Dataset.from_tensors(t1)
print(ds.output_types, end=' : ')
print(ds.output_shapes)
for e in ds:
    print (e)

print("\n=========   from_tensor_slices    ===========")
ds = tf.data.Dataset.from_tensor_slices(t1)
print(ds.output_types, end=' : ')
print(ds.output_shapes)
for e in ds:
    print (e)

output :

=========      from_tensors    ===========
<dtype: 'int32'> : (3, 2)
tf.Tensor(
[[11 22]
 [33 44]
 [55 66]], shape=(3, 2), dtype=int32)

=========   from_tensor_slices      ===========
<dtype: 'int32'> : (2,)
tf.Tensor([11 22], shape=(2,), dtype=int32)
tf.Tensor([33 44], shape=(2,), dtype=int32)
tf.Tensor([55 66], shape=(2,), dtype=int32)

The output is pretty much self-explanatory but as you can see, from_tensor_slices() slices the output of (what would be the output of) from_tensors() on its first dimension. You can also try with :

t1 = tf.constant([[[11, 22], [33, 44], [55, 66]],
                  [[110, 220], [330, 440], [550, 660]]])

回答4:

I think @MatthewScarpino clearly explained the differences between these two methods.

Here I try to describe the typical usage of these two methods:

from_tensors can be used to construct a larger dataset from several small datasets, i.e., the size (length) of the dataset becomes larger;
while from_tensor_slices can be used to combine different elements into one dataset, e.g., combine features and labels into one dataset (that's also why the 1st dimension of the tensors should be the same). That is, the dataset becomes "wider".

来源：https://stackoverflow.com/questions/49579684/what-is-the-difference-between-dataset-from-tensors-and-dataset-from-tensor-slic

标签

python

tensorflow

tensorflow-datasets