Building multiple models in the same graph

问题

I am attempting to build two similar models predicting different output types. One predicts between two categories and the other has six output categories. Their inputs are the same and they are both LSTM RNN.

I have separated training and predicting out into separate functions in each of their files, model1.py, model2.py.

I have made the mistake of naming variables in each model the same thing so that when I call predict1 and predict2 from model1 and model2 respectively I get the following name space error: ValueError: Variable W already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

Where W is the name of the matrix of weights.

Is there a good way of running these predictions from the same place? I have attempted to rename the variables involved but still get the following error. It doesn't seem that it's possible to name an lstm_cell on it's creation, is it?

ValueError: Variable RNN/BasicLSTMCell/Linear/Matrix already exists

EDIT: After scoping around model1pred and model2pred in the predictions file I get the following error when calling model1pred() then model2pred()

tensorflow.python.framework.errors.NotFoundError: Tensor name model1/model1/BasicLSTMCell/Linear/Matrix" not found in checkpoint files './variables/model1.chk

EDIT: The code is included here. The code in model2.py is missing but is equivalent to in model1.py except n_classes=2, and within the dynamicRNN function and inside pred the scope is set to 'model2'.

SOLUTION: The problem was the graph which the saver was trying to restore included variables from the first pred() execution. I was able to wrap calls of pred functions in different graphs to solve the issue, removing the need to variable scoping.

In collect predictions file:

def model1pred(test_x, test_seqlen):
    from model1 import pred
    with tf.Graph().as_default():
        return pred(test_x, test_seqlen)

def model2pred(test_x, test_seqlen):
    from model2 import pred
    with tf.Graph().as_default():
        return pred(test_x, test_seqlen)

##Import test_x, test_seqlen

probs1, preds1 = model1pred(test_x, test_seq)
probs2, cpreds2 = model2Pred(test_x, test_seq)

In model1.py

def dynamicRNN(x, seqlen, weights, biases):
    n_steps = 10
    n_input = 14
    n_classes = 6
    n_hidden = 100

    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, n_steps, n_input)
    # Required shape: 'n_steps' tensors list of shape (batch_size, n_input)

    # Permuting batch_size and n_steps
    x = tf.transpose(x, [1, 0, 2])
    # Reshaping to (n_steps*batch_size, n_input)
    x = tf.reshape(x, [-1,n_input])
    # Split to get a list of 'n_steps' tensors of shape (batch_size, n_input)
    x = tf.split(0, n_steps, x)

    # Define a lstm cell with tensorflow
    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)

    # Get lstm cell output, providing 'sequence_length' will perform dynamic calculation.
    outputs, states = tf.nn.rnn(lstm_cell, x, dtype=tf.float32, sequence_length=seqlen)

    # When performing dynamic calculation, we must retrieve the last
    # dynamically computed output, i.e, if a sequence length is 10, we need
    # to retrieve the 10th output.
    # However TensorFlow doesn't support advanced indexing yet, so we build
    # a custom op that for each sample in batch size, get its length and
    # get the corresponding relevant output.

    # 'outputs' is a list of output at every timestep, we pack them in a Tensor
    # and change back dimension to [batch_size, n_step, n_input]
    outputs = tf.pack(outputs)
    outputs = tf.transpose(outputs, [1, 0, 2])

    # Hack to build the indexing and retrieve the right output.
    batch_size = tf.shape(outputs)[0]
    # Start indices for each sample
    index = tf.range(0, batch_size) * n_steps + (seqlen - 1)
    # Indexing
    outputs = tf.gather(tf.reshape(outputs, [-1, n_hidden]), index)

    # Linear activation, using outputs computed above
    return tf.matmul(outputs, weights['out']) + biases['out']

def pred(test_x, test_seqlen):
     with tf.Session() as sess:
        n_steps = 10
        n_input = 14
        n_classes = 6
        n_hidden = 100
        weights = {'out': tf.Variable(tf.random_normal([n_hidden, n_classes]), name='W1')}
        biases = {'out': tf.Variable(tf.random_normal([n_classes]), name='b1')}
        x = tf.placeholder("float", [None, n_steps, n_input])
        y = tf.placeholder("float", [None, n_classes])
        seqlen = tf.placeholder(tf.int32, [None])

        pred = dynamicRNN(x, seqlen, weights, biases)
        saver = tf.train.Saver(tf.all_variables())
        y_p =tf.argmax(pred,1)

        init = tf.initialize_all_variables()
        sess.run(init)

        saver.restore(sess,'./variables/model1.chk')
        y_prob, y_pred= sess.run([pred, y_p], feed_dict={x: test_x, seqlen: test_seqlen})
        y_prob = np.array([softmax(x) for x in y_prob])
        return y_prob, y_pred

回答1:

You can do this by adding with tf.variable_scope(): blocks around the two pieces of model construction code. This has the effect of prefixing the variables' names with a different prefix, which avoids the clash.

For example (using the model1pred() and model2pred() functions defined in your question):

with tf.variable_scope('model1'):
  # Variables created in here will be named 'model1/W', etc.
  probs1, preds1 = model1pred(test_x, test_seq)

with tf.variable_scope('model2'):
  # Variables created in here will be named 'model2/W', etc.
  probs2, cpreds2 = model2Pred(test_x, test_seq)

For more details, see the in-depth HOWTO on variable sharing in TensorFlow.

来源：https://stackoverflow.com/questions/38750635/building-multiple-models-in-the-same-graph

标签

tensorflow

recurrent-neural-network

lstm