问题
I have 100k pics, and it doesn't fit into ram, so I need read it from disc while training.
dataset = tf.data.Dataset.from_tensor_slices(in_pics)
dataset = dataset.map(extract_fn)
def extract_fn(x):
x = tf.read_file(x)
x = tf.image.decode_jpeg(x, channels=3)
x = tf.image.resize_images(x, [64, 64])
return x
But then I try to train, I get this error
File system scheme '[local]' not implemented (file: '/content/anime-faces/black_hair/danbooru_2629248_487b383a8a6e7cc0e004383300477d66.jpg')
Can I work around it somehow? Also tried with TFRecords API, get the same error.
回答1:
The Cloud TPU you use in this scenario is not colocated on the same VM where your python runs. Easiest is to stage your data on GCS and use a gs:// URI to point the TPU at it.
To optimize performance when using GCS add prefetch(AUTOTUNE)
to your tf.data pipeline, and for small (<50GB) datasets use cache()
.
来源:https://stackoverflow.com/questions/53347293/google-colab-tpu-and-reading-from-disc-while-traning