问题

I would like to implement a 2D LSTM as in this paper, specifically I would like to do so dynamically, so using tf.while. In brief this network works as follows.

order the pixels in an image so that pixel i, j -> i * width + j
run a 2D-LSTM over this sequence

The difference between a 2D and regular LSTM is we have a recurrent connection between the previous element in the sequence and the pixel directly above the current pixel, so at pixel i,j are connections to i - 1, j and i, j - 1.

What I have done

I have tried to do this using tf.while where in each iteration of the loop I accumulate the activations and cell states into a tensor whose shape I allow to vary. This is what the following block of code tries to do.

def single_lstm_layer(inputs, height, width, units, direction = 'tl'):
    with tf.variable_scope(direction) as scope:
        #Get 2D lstm cell
        cell = lstm_cell

        #position in sequence
        row, col = tf.to_int32(0), tf.to_int32(0)

        #use for when i - 1 < 0 or j - 1 < 0
        zero_state = tf.fill([1, units], 0.0)

        #get first activation and cell_state
        output, state = cell(inputs.read(row * width + col), zero_state, zero_state, zero_state, zero_state)

        #these are currently of shape (1, units) will ultimately be of shape
        #(height * width, untis)
        activations = output
        cell_states = state
        col += 1

    with tf.variable_scope(direction, reuse = True) as scope:

        def loop_fn(activations, cell_states, row, col):
            #Read next input in sequence
            i = inputs.read(row * width + col)

            #if we are not in the first row then we want to get the activation/cell_state
            #above us. Otherwise use zero state.
            hidden_state_t = tf.cond(tf.greater_equal(row - 1, 0), 
                                    lambda:tf.gather(activations, [(row - 1) * (width) + col]),
                                    lambda:tf.identity(zero_state))
            cell_state_t = tf.cond(tf.greater_equal(row - 1, 0), 
                                    lambda:tf.gather(cell_states, [(row - 1) * (width) + col]),
                                    lambda:tf.identity(zero_state))

            #if we are not in the first col then we want to get the activation/cell_state
            #left of us. Otherwise use zero state.
            hidden_state_l = tf.cond(tf.greater_equal(col - 1, 0), 
                                    lambda:tf.gather(activations, [row * (width) + col - 1]),
                                    lambda:tf.identity(zero_state))
            cell_state_l = tf.cond(tf.greater_equal(col - 1, 0), 
                                    lambda:tf.gather(cell_states, [row * (width) + col - 1]),
                                    lambda:tf.identity(zero_state))

            #Using previous activations/cell_states get current activation/cell_state
            output, state = cell(i, hidden_state_l, hidden_state_t, cell_state_l, cell_state_t)

            #Append to bottom, will increase number of rows by 1
            activations = tf.concat(0, [activations, output])
            cell_states = tf.concat(0, [cell_states, state])

            #move to next item in sequence
            col = tf.cond(tf.equal(col, width - 1), lambda:tf.mul(col, 0), lambda:tf.add(col, 1))
            row = tf.cond(tf.equal(col, 0), lambda:tf.add(row, 1), lambda:tf.identity(row))
            return activations, cell_states, row, col,
        row, col = tf.to_int32(0), tf.constant(1)
        activations, cell_states, _, _ = tf.while_loop(
                                              cond = lambda activations, cell_states, row, col: tf.logical_and(tf.less_equal(row , (height - 1)), tf.less_equal(col, width -1)) ,
                                              body = loop_fn,
                                              loop_vars = (activations,   
                                                        cell_states, 
                                                        row, 
                                                        col),
                                              shape_invariants = (tf.TensorShape((None, units)), 
                                                                tf.TensorShape((None, units)),
                                                                tf.TensorShape([]),
                                                                tf.TensorShape([]),
                                                                ),
                                                        )
        #Return activations with shape [height, width, units]
        return tf.pack(tf.split(0, height, activations))

This works, at least in the forward direction. That is to say if I look at what is returned in a session then I get what I want which is a 3D tensor, call it T, of shape [height, width, units] where T[i,j,:] contains the activation of the LSTM cell at input i, j.

I then would like to classify each pixel and for this purpose I conv2D across T then reshape the result into [height * width, num_labels] and construct the cross entropy loss.

    T = tf.nn.conv2d(T, W, strides = [1, 1, 1, 1], padding = 'VALID')
    T = tf.reshape(T, [height * width, num_labels])

    loss = tf.reduce_mean(
                        tf.nn.softmax_cross_entropy_with_logits(
                        labels = tf.reshape(labels, [height * width, num_labels]), 
                        logits = T)
                        )
    optimizer = tf.train.AdagradOptimizer(0.01).minimize(loss)

The problem

However now when I try with an image which 28 x 28 and 32 units

    sess.run(optimizer, feed_dict = feed_dict)

I get the following error

File "Assignment2/train_model.py", line 52, in <module>
    train_models()
  File "/Assignment2/train_model.py", line 12, in train_models
    image, out, labels, optomizer, accuracy, prediction, ac = build_graph(28, 28)
  File "/Assignment2/multidimensional.py", line 101, in build_graph
    optimizer = tf.train.AdagradOptimizer(0.01).minimize(loss)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 196, in minimize
    grad_loss=grad_loss)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 253, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients.py", line 491, in gradients
    in_grad.set_shape(t_in.get_shape())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 408, in set_shape
    self._shape = self._shape.merge_with(shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 579, in merge_with
    (self, other))
ValueError: Shapes (784, 32) and (1, 32) are not compatible

I think this is a problem with calculating the gradients resulting from the tf.while loop but I am pretty lost at this point.

来源：https://stackoverflow.com/questions/42313828/dynamic-graphs-in-tensorflow

标签

image-processing