问题
I have an MNIST
like dataset that does not fit in memory, (process memory, not gpu memory).
My dataset is 4GB.
This is not a TFLearn
issue.
As far as I know model.fit
requires an array for x
and y
.
TFLearn example:
model.fit(x, y, n_epoch=10, validation_set=(val_x, val_y))
I was wondering is there's a way where we can pass a "batch iterator", instead of an array. Basically for each batch I would load the necessary data from disk.
This way I would not run into process memory overflow errors.
EDIT
np.memmap
could be an option. But I don't see how to skip the first few bytes that compose the header.
回答1:
You can use the Dataset api.
"The Dataset API supports a variety of file formats so that you can process large datasets that do not fit in memory"
Basically the input pipeline would become part of your graph.
If memory is still an issue then you can use a generator to create your tf.data.Dataset
. Further, you could potentially make the process quicker by preparing tfrecords to create you Dataset.
来源:https://stackoverflow.com/questions/46637347/dataset-does-not-fit-in-memory