问题
I want to setup a data pipeline working with sequential data. Each data point in a sequence has a fixed dimenstionality, e.g. 64x64. I have multiple sequences with variable length. So my dataset can be simplified to:
seq1 = np.arange(5)[:, None, None]
seq2 = np.arange(8)[:, None, None]
seq3 = np.arange(7)[:, None, None]
sequences = [seq1, seq2, seq3]
Now, I want to operate on a series of time frames within the sequences, resulting in 3-dimensional data cubes [N_frames, data_dim1, data_dim2].
For a single sequence, I found window
in TF's Dataset
API, which allows me to use windowing to build the data cubes:
window = 3
shift = 1
ds = tf.data.Dataset.from_tensor_slices(seq1)
ds = ds.window(size=window , shift=shift, drop_remainder=True).flat_map(lambda x: x.batch(window))
for d in ds:
print(d)
results in
tf.Tensor(
[[[0]]
[[1]]
[[2]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[1]]
[[2]]
[[3]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[2]]
[[3]]
[[4]]], shape=(3, 1, 1), dtype=int32)
Now, I struggle with transferring this operation to my full set of sequences. How can I get all the data cubes from my set of sequences?
回答1:
I found an answer by myself. I use the window
function on each sequence separately. I wrap this procedure in a small function, which is then applied to my set of sequences via flat_map
:
sequences = [np.arange(5)[:, None, None], np.arange(20, 24)[:, None, None]]
def get_data_cubes(sequence, size, shift=None, stride=1, drop_remainder=False):
ds = tf.data.Dataset.from_tensor_slices(sequence)
ds = ds.window(size=size, shift=shift, stride=stride, drop_remainder=drop_remainder)
ds = ds.flat_map(lambda x: x.batch(size))
return ds
window = 3
shift = 1
dataset = tf.data.Dataset.from_generator(lambda: sequences, tf.as_dtype(sequences[0].dtype), tf.TensorShape([None, 1, 1]))
dataset = dataset.flat_map(lambda x: get_data_cubes(x, window, shift=shift, drop_remainder=True))
for d in dataset:
print(d)
results in
tf.Tensor(
[[[0]]
[[1]]
[[2]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[1]]
[[2]]
[[3]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[2]]
[[3]]
[[4]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[20]]
[[21]]
[[22]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[21]]
[[22]]
[[23]]], shape=(3, 1, 1), dtype=int32)
which is exactly the result I searched for. BTW: This dataset can be treated like a standard TF dataset with shuffling, batching, etc.
来源:https://stackoverflow.com/questions/56079722/tensorflow-dataset-api-apply-windows-to-multiple-sequences