Dataset does not fit in memory

放肆的年华 提交于 2021-01-27 07:08:45

问题


I have an MNIST like dataset that does not fit in memory, (process memory, not gpu memory). My dataset is 4GB.

This is not a TFLearn issue.

As far as I know model.fit requires an array for x and y.

TFLearn example:

model.fit(x, y, n_epoch=10, validation_set=(val_x, val_y))

I was wondering is there's a way where we can pass a "batch iterator", instead of an array. Basically for each batch I would load the necessary data from disk.

This way I would not run into process memory overflow errors.

EDIT np.memmap could be an option. But I don't see how to skip the first few bytes that compose the header.


回答1:


You can use the Dataset api.

"The Dataset API supports a variety of file formats so that you can process large datasets that do not fit in memory"

Basically the input pipeline would become part of your graph.

If memory is still an issue then you can use a generator to create your tf.data.Dataset. Further, you could potentially make the process quicker by preparing tfrecords to create you Dataset.



来源:https://stackoverflow.com/questions/46637347/dataset-does-not-fit-in-memory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!