Mask R-CNN for TPU on Google Colab

问题

We are trying to build an image segmentation deep learning model using Google Colab TPU. Our model is Mask R-CNN.

TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']

import tensorflow as tf
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
model.keras_model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))

However I am running into issues while converting our Mask R-CNN model to TPU model as pasted below.

ValueError: 
Layer <keras.engine.topology.InputLayer object at 0x7f58574f1940> has a 
variable shape in a non-batch dimension.  TPU models must
have constant shapes for all operations.

You may have to specify `input_length` for RNN/TimeDistributed layers.

Layer: <keras.engine.topology.InputLayer object at 0x7f58574f1940>
Input shape: (None, None, None, 3)
Output shape: (None, None, None, 3)

Appreciate any help.

回答1:

Google recently released a tutorial on getting Mask R-CNN going on their TPUs. For this, they are using an experimental model for Mask RCNN on Google's TPU github repository (under models/experimental/mask_rcnn). Looking through the code, it looks like they define the model with a fixed input size to overcome the issue you are seeing.

See below for more explanation:

As @aman2930 points out, the shape of your input tensor is not static. This won't work because Tensorflow compiles models with XLA to use a TPU and XLA must have all tensor shapes defined at compile time. In the link above, the website specifically calls this out:

Static shapes

During regular usage TensorFlow attempts to determine the shapes of each tf.Tensor during graph construction. During execution any unknown shape dimensions are determined dynamically, see Tensor Shapes for more details.

To run on Cloud TPUs TensorFlow models are compiled using XLA. XLA uses a similar system for determining shapes at compile time. XLA requires that all tensor dimensions be statically defined at compile time. All shapes must evaluate to a constant, and not depend on external data, or stateful operations like variables or a random number generator.

That side, further down the document, they mention that the input function is run on the CPU, so isn't limited by static XLA sizes. They point to batch size being the issue, not image size:

Static shapes and batch size

The input pipeline generated by your input_fn is run on CPU. So it is mostly free from the strict static shape requirements imposed by the XLA/TPU environment. The one requirement is that the batches of data fed from your input pipeline to the TPU have a static shape, as determined by the standard TensorFlow shape inference algorithm. Intermediate tensors are free to have a dynamic shapes. If shape inference has failed, but the shape is known it is possible to impose the correct shape using tf.set_shape().

So you could fix this by reformulating your model to have fixed batch size or to use tf.contrib.data.batch_and_drop_remainder as they suggest.

回答2:

Could you please share the input data function. It is hard to tell the exact issue, but it seems that the shape of tensor representing input sample is not static.

来源：https://stackoverflow.com/questions/52857901/mask-r-cnn-for-tpu-on-google-colab

标签

google-colaboratory

google-cloud-tpu