问题
I am training a resNet50 with tensorflow, using a shared server with these properties:
ubuntu 16.04 3 gtx 1080 gpus tensorflow 1.3 python 2.7 but always after two epochs, and during the third epoch, I encounter this error:
terminate called after throwing an instance of 'std::system_error'
what():
Resource temporarily unavailable
Aborted
this is code convert tfrecord to dataset:
filenames = ["balanced_t.tfrecords"]
dataset = tf.contrib.data.TFRecordDataset(filenames)
def parser(record):
keys_to_features = {
"mhot_label_raw": tf.FixedLenFeature((), tf.string,
default_value=""),
"mel_spec_raw": tf.FixedLenFeature((), tf.string,
default_value=""),
}
parsed = tf.parse_single_example(record, keys_to_features)
mel_spec1d = tf.decode_raw(parsed['mel_spec_raw'], tf.float64)
# label = tf.cast(parsed["label"], tf.string)
mhot_label = tf.decode_raw(parsed['mhot_label_raw'], tf.float64)
mel_spec = tf.reshape(mel_spec1d, [96, 64])
return {"mel_data": mel_spec}, mhot_label
dataset = dataset.map(parser)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat(3)
iterator = dataset.make_one_shot_iterator()
and this is input pipline:
while True:
try:
(features, labels) = sess.run(iterator.get_next())
except tf.errors.OutOfRangeError:
print("end of training dataset")
After inserting some print message in my code,Ihave discovered that the below line cause this error:
(features, labels) = sess.run(iterator.get_next())
But,I cant solve it
回答1:
Your code has a (subtle) memory leak, so it's possible that the process is running out of memory and being terminated. The issue is that calling iterator.get_next()
in each loop iteration will add a new node to the TensorFlow graph, which will end up consuming a lot of memory.
To stop the memory leak, rewrite your while
loop as the following:
# Call `get_next()` once outside the loop to create the TensorFlow operations once.
next_element = iterator.get_next()
while True:
try:
(features, labels) = sess.run(next_element)
except tf.errors.OutOfRangeError:
print("end of training dataset")
来源:https://stackoverflow.com/questions/47499138/iterator-get-next-cause-terminate-called-after-throwing-an-instance-of-stds