Key not found in checkpoint Tensorflow

前端 未结 2 665
梦谈多话
梦谈多话 2021-02-13 11:38

I\'m using Tensorflow v1.1 and I\'ve been trying to figure out how to use my EMA\'ed weights for inference, but no matter what I do I keep getting the error

相关标签:
2条回答
  • 2021-02-13 11:48

    I'd like to add a method to use the trained variables in the checkpoint at best.

    Keep in mind that all variables in the saver var_list should be contained in the checkpoint you configured. You can check those in the saver by:

    print(restore_vars)
    

    and those variables in the checkpoint by:

    vars_in_checkpoint = tf.train.list_variables(os.path.join("model_ex1"))
    

    in your case.

    If the restore_vars are all included in vars_in_checkpoint then it will not raise the error, otherwise initialize all variables first:

    all_variables = tf.get_collection_ref(tf.GraphKeys.GLOBAL_VARIABLES)
    sess.run(tf.variables_initializer(all_variables))
    

    All variables will be initialized be those in or not in the checkpoint, then you can filter out those variables in restore_vars that are not included in the checkpoint(suppose all variable with ExponentialMovingAverage in their names are not in the checkpoint):

    temp_saver = tf.train.Saver(
        var_list=[v for v in all_variables if "ExponentialMovingAverage" not in v.name])
    ckpt_state = tf.train.get_checkpoint_state(os.path.join("model_ex1"), lastest_filename)
    print('Loading checkpoint %s' % ckpt_state.model_checkpoint_path)
    temp_saver.restore(sess, ckpt_state.model_checkpoint_path)
    

    This may save some time compared to training the model from scratch. (In my scenario the restored variables make no significant improvement compared to training from scratch in the beginning, since all old optimizer variables are abandoned. But it can accelerate the optimization process significantly, I think, because it is like pretraining some variables)

    Anyway, some variables are useful to be restored like embeddings and some layers and etc.

    0 讨论(0)
  • 2021-02-13 12:00

    The key not found in checkpoint error means that the variable exists in your model in memory but not in the serialized checkpoint file on disk.

    You should use the inspect_checkpoint tool to understand what tensors are being saved in your checkpoint, and why some exponential moving averages are not being saved here.

    It's not clear from your repro example which line is supposed to trigger the error

    0 讨论(0)
提交回复
热议问题