问题
After reading through the tf.data
documentation (here for TF 1.15), related TF code (both Python and C++), I realized that most of it seems to run purely on CPU, except of PrefetchDataset
.
Is that true?
The documentation for prefetch_to_device
says:
NOTE: Although the transformation creates a
tf.data.Dataset
, the transformation must be the finalDataset
in the input pipeline.
Which suggest that all other datasets cannot handle such a GPU-based dataset.
While looking through the code, there seems to be some internal datasets, e.g. _CopyToDeviceDataset
and _MapOnGpuDataset
, which might handle GPU datasets.
If I want to have my preprocessing (e.g. data augmentation, some other clever non-trivial stuff) on GPU, that means that I cannot use tf.data
? (I also want to use graph mode, but not sure if that is relevant.)
Now I also found _GeneratorDataset
. That kernel also is defined on GPU. So that means that if my next_func
returns a tensor on GPU, it will always stay on GPU?
回答1:
You are correct, tf.data
currently does all of its processing on CPU. Usually this is desirable to avoid contending with the GPU, and because much preprocessing is easier to implement on the CPU. However, there isn't anything fundamentally preventing tf.data
from doing processing on the GPU - it's just a matter of implementing such support. From the looks of it, map_on_gpu does offer a way to apply a map function on GPU, though map_on_gpu
isn't yet exported in the public API. If you're interested in such functionality, please create a Github Issue describing your use case
To answer your last question, if you have a _GeneratorDataset
which is placed on GPU, the tensors it produces will also be on GPU. However, you cannot apply additional dataset transformations other than prefetch
because those dataset transformations don't have GPU implementations.
来源:https://stackoverflow.com/questions/61964379/tf-data-dataset-runs-on-cpu-except-of-prefetchdataset