问题
I am using Tensorflow Object Detection API to train my own object detector. I downloaded the faster_rcnn_inception_v2_coco_2018_01_28
from the model zoo (here), and made my own dataset (train.record (~221Mo), test.record and the label map) to fine tune it.
But when I run it :
python train.py --logtostderr --pipeline_config_path=/home/username/Documents/Object_Detection/training/faster_rcnn_inception_v2_coco_2018_01_28/pipeline.config --train_dir=/home/username/Documents/Object_Detection/training/
the process is killed during the filling up shuffle buffer operation, looks like an OOM problem (16Go RAM)...
2018-06-07 12:02:51.107021: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:94] Filling up shuffle buffer (this may take a while): 410 of 2048
Process stopped
Does it exist a way to reduce the shuffle buffer size ? What impact its size ?
Then, I add some swap (115Go swap + 16Go RAM) and the filling up shuffle buffer op finished, but my training took all the RAM and swap after step 4 whereas my train.record is just about 221 Mo !
I already added those lines to my pipeline.config > train_config:
batch_size: 1
batch_queue_capacity: 10
num_batch_queue_threads: 8
prefetch_queue_capacity: 9
and these ones to my pipeline.config > train_input_reader :
queue_capacity: 2
min_after_dequeue: 1
num_readers: 1
following this post.
I know my images are very (very very) large : 25Mo each, but as I only took 9 images to make my train.record (just to test if my installation gone well), it should not be so memory consuming right ?
Any other idea about why it uses so much RAM ?
(BTW I only use CPU)
回答1:
The number of images is not the problem. The problem is your input image resolution(in your setting .config file). You need to change height and width value at here(similar in your .config file):
image_resizer {
# TODO(shlens): Only fixed_shape_resizer is currently supported for NASNet
# featurization. The reason for this is that nasnet.py only supports
# inputs with fully known shapes. We need to update nasnet.py to handle
# shapes not known at compile time.
fixed_shape_resizer {
height: 1200
width: 1200
}
}
Set to smaller value width and height and you will able to train perfectly.
来源:https://stackoverflow.com/questions/50742757/tensorflow-object-detection-api-out-of-memory