Tensorflow load pre-trained model use different optimizer

问题

I want to load a pre-trained model (optimized by AdadeltaOptimizer) and continue training with SGD (GradientDescentOptimizer). The models are saved and loaded with tensorlayer API:

save model:

import tensorlayer as tl
tl.files.save_npz(network.all_params,
                  name=model_dir + "model-%d.npz" % global_step)

load model:

load_params = tl.files.load_npz(path=resume_dir + '/', name=model_name)
tl.files.assign_params(sess, load_params, network)

If I continue training with adadelta, the training loss (cross entropy) looks normal (start at a close value as the loaded model). However, if I change the optimizer to SGD, the training loss would be as large as a newly initialized model.

I took a look at the model-xxx.npz file from tl.files.save_npz. It only saves all model parameters as ndarray. I'm not sure how the optimizer or learning rate is involved here.

回答1:

You probably would have to import the tensor into a variable which is the loss function/cross-entropy that feeds into your Adam Optimizer previously. Now, just feed it through your SGD optimizer instead.

saver = tf.train.import_meta_graph('filename.meta')
saver.restore(sess,tf.train.latest_checkpoint('./'))
graph = tf.get_default_graph()
cross_entropy = graph.get_tensor_by_name("entropy:0") #Tensor to import

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

In this case, I have tagged the cross-entropy Tensor before training my pre-train model with the name entropy, as such

tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv), name = 'entropy')

If you are unable to make changes to your pretrain model, you can obtain the list of Tensors in your model(after you have imported it) from graph and deduce which Tensor you require. I have no experience with Tensorlayer, so this guide is to provide more of an understanding. You can take a look at Tensorlayer-Layers, they should explain how to obtain your Tensor. As Tensorlayer is built on top of Tensorflow, most of the functions should still be available.

回答2:

You can specify the parameters you want to save in your checkpoint file.

save_npz([save_list, name, sess])

In the save_list you're specifying only the network parameters that don't contain the optimizer parameters, thus no learning rate or any other optimizer parameters.

If you want to save the current learning rate (in order to use the same exact learning rate when you restore the model) you have to add it to the save_list, like that:

save_npz(network.all_params.extend([learning_rate])

(I suppoose that all_params is an array, I guess my supposition is correct.

Since you want to change the optimizer, I suggest you save the learning_rate only as optimizer parameter and not any other variable that the optimizer creates. In that way, you'll be able to change the optimizer and restoring the model, otherwise (if you put in your checkpoint any other variable) the graph you'll try to restore won't find the variables in which place the saved value and you won't be able to change it.

回答3:

https://tensorlayer.readthedocs.io/en/latest/user/get_start_advance.html#pre-trained-cnn

vgg = tl.models.vgg16(pretrained=True)
img = tl.vis.read_image('data/tiger.jpeg')
img = tl.prepro.imresize(img, (224, 224)).astype(np.float32) / 255
output = vgg(img, is_train=False)

For 2.0 version, use this

来源：https://stackoverflow.com/questions/44710772/tensorflow-load-pre-trained-model-use-different-optimizer

标签

optimization

tensorflow

pre-trained-model