TPU training freezes in the middle of training
问题 I'm trying to train a CNN regression net in TF 1.12, using TPU v3-8 1.12 instance. The model succesfully compiles with XLA, starting the training process, but some where after the half iterations of the 1t epoch freezes, and doing nothing. I cannot find the root of the problem. def read_tfrecord(example): features = { 'image': tf.FixedLenFeature([], tf.string), 'labels': tf.FixedLenFeature([], tf.string) } sample=tf.parse_single_example(example, features) image = tf.image.decode_jpeg(sample[