I have a specific case where the networks are relatively tiny and for convergence and generalization matters I should maintain small batch sizes (e.g. 256), which leads to hundr
You don't have to load the whole data. You can ingest the data piece by piece using the DataSet class.
Tensorflow can take care of loading more data while your gpu is crunching your numbers. You can follow the below steps.
You can check the example listed here.
Hope this is helpful.