问题
I am very new to ML using Big Data and I have played with Keras generic convolutional examples for the dog/cat classification before, however when applying a similar approach to my set of images, I run into memory issues.
My dataset consists of very long images that are 10048 x1687 pixels in size. To circumvent the memory issues, I am using a batch size of 1, feeding in one image at a time to the model.
The model has two convolutional layers, each followed by max-pooling which together make the flattened layer roughly 290,000 inputs right before the fully-connected layer.
Immediately after running however, Memory usage chokes at its limit (8Gb).
So my questions are the following:
1) What is the best approach to process computations of such size in Python locally (no Cloud utilization)? Are there additional python libraries that I need to utilize?
回答1:
Check out what yield
does in python and the idea of generators. You do not need to load all of your data at the beginning. You should make your batch_size
just small enough that you do not get memory errors.
Your generator can look like this:
def generator(fileobj, labels, memory_one_pic=1024, batch_size):
start = 0
end = start + batch_size
while True:
X_batch = fileobj.read(memory_one_pic*batch_size)
y_batch = labels[start:end]
start += batch_size
end += batch_size
if not X_batch:
break
if start >= amount_of_datasets:
start = 0
end = batch_size
yield (X_batch, y_batch)
...later when you already have your architecture ready...
train_generator = generator(open('traindata.csv','rb'), labels, batch_size)
train_steps = amount_of_datasets//batch_size + 1
model.fit_generator(generator=train_generator,
steps_per_epoch=train_steps,
epochs=epochs)
You should also read about batch_normalization
, which basically helps to learn faster and with better accuracy.
回答2:
While using train_generator()
, you should also set the max_q_size
parameter. It's set at 10 by default, which means you're loading in 10 batches while using only 1 (since train_generator()
was designed to stream data from outside sources that can be delayed like network, not to save memory). I'd recommend setting max_q_size=1
for your purposes.
来源:https://stackoverflow.com/questions/44569938/memory-issues-using-keras-convolutional-network