InvalidArgumentError in restore: Assign requires shapes of both tensors to match

我的梦境 提交于 2020-01-04 07:11:12

问题


First I would like to mention I am new to Tensorflow, I am working on OCR project using CTC (Connectionist Temporal Classification) and LSTM (Long Short Term Memory). I have done the training and when i am trying to restore session I found an error that is commonly posted on the internet but different analysis has been provided.

Error is :-

 2018-01-10 13:42:43.179534: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-01-10 13:42:43.179939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: Quadro M4000
major: 5 minor: 2 memoryClockRate (GHz) 0.7725
pciBusID 0000:00:05.0
Total memory: 7.93GiB
Free memory: 7.56GiB
2018-01-10 13:42:43.179974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2018-01-10 13:42:43.179986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2018-01-10 13:42:43.180002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M4000, pci bus id: 0000:00:05.0)
2018-01-10 13:42:43.316563: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.319682: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.332996: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.333927: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.334583: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.379830: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380081: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380189: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380188: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380343: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.380554: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
2018-01-10 13:42:43.415117: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]
Traceback (most recent call last):
  File "detect.py", line 62, in <module>
    print(detect(test_inputs, test_targets, test_seq_len))
  File "detect.py", line 23, in detect
    saver.restore(sess,'models/ocr.model-100000')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1548, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]

Caused by op u'save/Assign_1', defined at:
  File "detect.py", line 62, in <module>
    print(detect(test_inputs, test_targets, test_seq_len))
  File "detect.py", line 20, in detect
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1139, in __init__
    self.build()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1170, in build
    restore_sequentially=self._restore_sequentially)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 691, in build
    restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 155, in restore
    self.op.get_shape().is_fully_defined())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 271, in assign
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 45, in assign
    use_locking=use_locking, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [48] rhs shape= [5,5,1,48]
     [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](Variable_1, save/RestoreV2_1/_25)]]

I have analysed the error related to the function saver.restore(sess,'models/ocr.model-100000')

This is mainly related to many things, what I have done till now is:

  • removed all the checkpoint that been saved from the pervious training and start all over again but still this wasn't enough

  • I have used the function print_tensors_in_checkpoint_file provided by Tensorflow and the checkpoint looks fine to me.

This is the output:

Variable    []
Variable_1  [5, 5, 1, 48]
Variable_1/Momentum [5, 5, 1, 48]
Variable_2  [48]
Variable_2/Momentum [48]
Variable_3  [5, 5, 48, 64]
Variable_3/Momentum [5, 5, 48, 64]
Variable_4  [64]
Variable_4/Momentum [64]
Variable_5  [5, 5, 64, 128]
Variable_5/Momentum [5, 5, 64, 128]
Variable_6  [128]
Variable_6/Momentum [128]
Variable_7  [65536, 256]
Variable_7/Momentum [65536, 256]
Variable_8  [256]
Variable_8/Momentum [256]
W   [128, 38]
W/Momentum  [128, 38]
b   [38]
b/Momentum  [38]
rnn/multi_rnn_cell/cell_0/lstm_cell/bias    [512]
rnn/multi_rnn_cell/cell_0/lstm_cell/bias/Momentum   [512]
rnn/multi_rnn_cell/cell_0/lstm_cell/kernel  [129, 512]
rnn/multi_rnn_cell/cell_0/lstm_cell/kernel/Momentum [129, 512]
rnn/multi_rnn_cell/cell_1/lstm_cell/bias    [512]
rnn/multi_rnn_cell/cell_1/lstm_cell/bias/Momentum   [512]
rnn/multi_rnn_cell/cell_1/lstm_cell/kernel  [256, 512]
rnn/multi_rnn_cell/cell_1/lstm_cell/kernel/Momentum [256, 512]
[<tf.Variable 'Variable:0' shape=(5, 5, 1, 48) dtype=float32_ref>, <tf.Variable 'Variable_1:0' shape=(48,) dtype=float32_ref>, <tf.Variable 'Variable_2:0' shape=(5, 5, 48, 64) dtype=float32_ref>, <tf.Variable 'Variable_3:0' shape=(64,) dtype=float32_ref>, <tf.Variable 'Variable_4:0' shape=(5, 5, 64, 128) dtype=float32_ref>, <tf.Variable 'Variable_5:0' shape=(128,) dtype=float32_ref>, <tf.Variable 'Variable_6:0' shape=(65536, 256) dtype=float32_ref>, <tf.Variable 'Variable_7:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/kernel:0' shape=(129, 512) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_0/lstm_cell/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/kernel:0' shape=(256, 512) dtype=float32_ref>, <tf.Variable 'rnn/multi_rnn_cell/cell_1/lstm_cell/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'W:0' shape=(128, 38) dtype=float32_ref>, <tf.Variable 'b:0' shape=(38,) dtype=float32_ref>]

My curious is about how the saver gets the size and how to debug the code.


回答1:


Looks like you've changed the order of variables in your graph definition at some point: [5, 5, 1, 48] is the shape of Variable_1 and [48] is the shape of Variable_2 in the saved checkpoint.

The naming indicates that you didn't give explicit names to the variables, as a result they got the names Variable, Variable_1, Variable_2, ... The suffix is determined according to the order in which tensorflow sees them, so if you swap two variables in code, they will get different names. After that you can no longer import earlier saved checkpoints, because tensorflow sees a different tensor under the same name.

The best practice is to specify the name of each variable explicitly via name attribute:

W_conv1 = `tf.Variable(..., name='W_conv1')
b_conv1 = `tf.Variable(..., name='b_conv1')
...

This way the code will be more robust to small perturbations in the model.



来源:https://stackoverflow.com/questions/48186569/invalidargumenterror-in-restore-assign-requires-shapes-of-both-tensors-to-match

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!