问题
I am working on retraining TF Object Detection API's mobilenet(v1)-SSD, and having trouble with the error that I'm getting at the training step.
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path xxxx/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, indices[3] = 3 is not in [0, 3)
[[Node: cond_2/RandomCropImage/PruneCompleteleyOutsideWindow/Gather/Gather_1 = Gather[Tindices=DT_INT64, Tparams=DT_INT64, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](cond_2/Switch_3:1, cond_2/RandomCropImage/PruneCompleteleyOutsideWindow/Reshape)]]
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Caught OutOfRangeError. Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
File "object_detection/train.py", line 168, in <module>
tf.app.run()
File "/home/khatta/.virtualenvs/dl/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "object_detection/train.py", line 165, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "xxxx/research/object_detection/trainer.py", line 361, in train
saver=saver)
File "/home/khatta/.virtualenvs/dl/lib/python3.5/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 782, in train
ignore_live_threads=ignore_live_threads)
File "/home/khatta/.virtualenvs/dl/lib/python3.5/site-packages/tensorflow/python/training/supervisor.py", line 826, in stop
ignore_live_threads=ignore_live_threads)
File "/home/khatta/.virtualenvs/dl/lib/python3.5/site-packages/tensorflow/python/training/coordinator.py", line 387, in join
six.reraise(*self._exc_info_to_raise)
File "/home/khatta/.virtualenvs/dl/lib/python3.5/site-packages/six.py", line 693, in reraise
raise value
File "/home/khatta/.virtualenvs/dl/lib/python3.5/site-packages/tensorflow/python/training/queue_runner_impl.py", line 250, in _run
enqueue_callable()
File "/home/khatta/.virtualenvs/dl/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1251, in _single_operation_run
self._session, None, {}, [], target_list, status, None)
File "/home/khatta/.virtualenvs/dl/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[3] = 3 is not in [0, 3)
[[Node: cond_2/RandomCropImage/PruneCompleteleyOutsideWindow/Gather/Gather_1 = Gather[Tindices=DT_INT64, Tparams=DT_INT64, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](cond_2/Switch_3:1, cond_2/RandomCropImage/PruneCompleteleyOutsideWindow/Reshape)]]
This error happens at the start when I prepare the TFRecords file with a comparatively large amount of data (around 16K images).
When I use small amount of data (around 1K images), the error happens after around 100 steps of training. The error code structure is the same.
The structure of the TFRecord creating script is like the below; I wanted to tile out the large images so that the annotations won't get too small at the 300x300 resizing step in SSD and I thought it will give better results:
import tensorflow as tf
import pandas as pd
import hashlib
def _tiling(image_array, labels, tile_size=(300,300)):
'''tile image according to the tile_size argument'''
<do stuff>
yield tiled_image_array, tiled_label
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _int64_list_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=value))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _bytes_list_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))
def _float_list_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
def _make_tfexample(tiled_image_array, tiled_label):
img_str = cv2.imencode('.jpg', tiled_image_array)[1].tobytes()
height, width, _ = tiled_image_array.shape
# tiled_label's contents:
# ['tilename', ['object_name', 'object_name', ...],
# [xmin, xmin, ...], [ymin, ymin, ...],
# [xmax, xmax, ...], [ymax, ymax, ...]]
tile_name, object_names, xmins, ymins, xmaxs, ymaxs = tiled_label
filename = bytes(tile_name, 'utf-8')
image_format = b'jpeg'
key = hashlib.sha256(img_str).hexdigest()
xmins = [xmin/width for xmin in xmins]
ymins = [ymin/height for ymin in ymins]
xmaxs = [xmax/width for xmax in xmaxs]
ymaxs = [ymax/height for ymax in ymaxs]
classes_text = [bytes(obj, 'utf-8') for obj in object_names]
# category => {'object_name': #id, ...}
classes = [category[obj] for obj in obj_names]
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': _int64_feature(height),
'image/width': _int64_feature(width),
'image/filename': _bytes_feature(filename),
'image/source_id': _bytes_feature(filename),
'image/key/sha256': _bytes_feature(key.encode('utf-8')),
'image/encoded': _bytes_feature(img_str),
'image/format': _bytes_feature(image_format),
'image/object/bbox/xmin': _float_list_feature(xmins),
'image/object/bbox/ymin': _float_list_feature(ymins),
'image/object/bbox/xmax': _float_list_feature(xmaxs),
'image/object/bbox/ymax': _float_list_feature(ymaxs),
'image/object/class/text': _bytes_list_feature(classes_text),
'image/object/class/label': _int64_list_feature(classes)
}))
return tf_example
def make_tfrecord(image_path, csv_path, tfrecord_path):
'''convert image and labels into tfrecord file'''
csv = pd.read_csv(csv_path)
with tf.python_io.TFRecordWriter(tfrecord_path) as writer:
for row in csv.itertuples():
img_array = cv2.imread(image_path + row.filename)
img_array = cv2.cvtColor(img_array, cv2.COLOR_BGR2RGB)
tile_generator = _tiling(image_array, row.label)
for tiled_image_array, tiled_label in tile_generator:
tf_example = _make_tfexample(tiled_image_array, tiled_labels)
writer.write(tf_example.SerializeToString())
Any suggestions on why this error might be happening are welcome. Thank you in advance!
回答1:
This was caused by the length of the obj_names
list not matching the other list elements' lengths (xmins, ymins, xmaxs, ymaxs, classes
).
The cause was a bug in my code, but I'm just posting this FYI if you get a similar error and need some hint to debug it.
In short, you need (in _make_tfexample
function above)
xmins = [a_xmin, b_xmin, c_xmin]
ymins = [a_ymin, b_ymin, c_ymin]
xmaxs = [a_xmax, b_xmax, c_xmax]
ymaxs = [a_ymax, b_ymax, c_ymax]
classes_text = [a_class, b_class, c_class]
classes = [a_classid, b_classid, c_classid]
so that the lists' indices match with each other. But the error happens when the lengths of the lists don't match for some reason.
回答2:
I also came across the same error and was going from page to page trying to find an answer. Unfortunately, the shape of the data and the labels was not the reason I was getting this error. I found the same question in multiple places over stackoverflow, so check this to see if this solves your problem.
来源:https://stackoverflow.com/questions/49730923/tensorflow-object-detection-api-indices3-3-is-not-in-0-3-error