Tensorflow: How to prefetch data on the GPU from CPU tf.data.Dataset (from_generator)

你说的曾经没有我的故事 提交于 2021-02-07 10:15:38

问题


I am struggling with the following. I am creating a tf.data.Dataset using the from_generator method. I perform these actions on CPU as I don't want to overload my GPU memory.

The dataset consists of tuples, which contain a tf.bool 1-D mask (tf.Tensor) with fixed length, and a tf.float 2-D matrix (tf.Tensor) with variable size. The loss function is decorated using the following decorator, so I would not assume the variable size is the problem.

@tf.function(experimental_relax_shapes=True)

Ideally, the dataset is kept on the CPU, but then prefetched onto the GPU.

        def gen():
            for i, j in zip(mask_list, wmat_list):
                yield i, j

        dataset = tf.data.Dataset.from_generator(gen, output_types=(tf.bool, tf.float32))

The main training loop currently relies on tf.identity to move the data to the gpu, which is inefficient. As shown on the screenshot from Tensorboard below. Roughly 70% of the time is spend loading the data and moving it to GPU.

                for b, (mask, wmat) in enumerate(dataset):
                    with tf.GradientTape() as tape:

                        mask = tf.identity(mask)
                        wmat = tf.identity(wmat)

                        mean_error, loss = self.model.loss(mask, wmat)
                        epoch_loss += loss.numpy()
                        epoch_mean_error += mean_error.numpy()

I have tried the "prefetch_to_device" function. However, it did not move the data onto the GPU. As verified by printing e.g. mask.device in the training loop.

        gpu_transform = tf.data.experimental.prefetch_to_device('/gpu')
        dataset.apply(gpu_transform)

For me it resembles to this bug: https://github.com/tensorflow/tensorflow/issues/30929 . However, it is marked as solved and is over a year old.

Running TF 2.3 using the official Docker image.


回答1:


I have found the solution to my own question.

The problem was that the tuples in the dataset did not contain tf.Tensors, but numpy arrays. Therefore, the pipeline was probably limited by the functionality of py_func().

The screenshot below show that the pipeline does not block on the CPU. However there is still a considerable MemCpy. The prefetch_to_device() still does not do anything. This is likely due to a known issue which should be fixed in TF2.4

https://github.com/tensorflow/tensorflow/issues/35563

The (unconfirmed) suggested workaround also did not work for me. (see edit)

with tf.device("/gpu:0"):
    ds = ds.prefetch(1)

EDIT:

I have further investigated this issue and filed a bug report. It does now seem that the suggested workaround does something, but not sure if it completely prefetches in time. https://github.com/tensorflow/tensorflow/issues/43905



来源:https://stackoverflow.com/questions/64142435/tensorflow-how-to-prefetch-data-on-the-gpu-from-cpu-tf-data-dataset-from-gener

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!