What are _get_hyper and _set_hyper in TensorFlow optimizers?

▼魔方 西西 提交于 2020-06-16 04:18:22

问题


I see it in __init__ of e.g. Adam optimizer: self._set_hyper('beta_1', beta_1). There are also _get_hyper and _serialize_hyperparameter throughout the code. I don't see these in Keras optimizers - are they optional? When should or shouldn't they be used when creating custom optimizers?


回答1:


They enable setting and getting Python literals (int, str, etc), callables, and tensors. Usage is for convenience and consistency: anything set via _set_hyper can be retrieved via _get_hyper, avoiding repeating boilerplate code. I've implemented Keras AdamW in all major TF & Keras versions, and will use it as reference.

  • t_cur is a tf.Variable. Each time we "set" it, we must invoke K.set_value; if we do self.t_cur=5, this will destroy tf.Variable and wreck optimizer functionality. If instead we used model.optimizer._set_hyper('t_cur', 5), it'd set it appropriately - but this requires for it to have been defined via set_hyper previously.
  • Both _get_hyper & _set_hyper enable programmatic treatment of attributes - e.g., we can make a for-loop with a list of attribute names to get or set using just _get_hyper and _set_hyper, whereas otherwise we'd need to code conditionals and typechecks. Also, _get_hyper(name) requires that name was previously set via set_hyper.

  • _get_hyper enables typecasting via dtype=. Ex: beta_1_t in default Adam is cast to same numeric type as var (e.g. layer weight), which is required for some ops. Again a convenience, as we could typecast manually (math_ops.cast).

  • _set_hyper enables the use of _serialize_hyperparameter, which retrieves the Python values (int, float, etc) of callables, tensors, or already-Python values. Name stems from the need to convert tensors and callables to Pythonics for e.g. pickling or json-serializing - but can be used as convenience for seeing tensor values in Graph execution.

  • Lastly; everything instantiated via _set_hyper gets assigned to optimizer._hyper dictionary, which is then iterated over in _create_hypers. The else in the loop casts all Python numerics to tensors - so _set_hyper will not create int, float, etc attributes. Worth noting is the aggregation= kwarg, whose documentation reads: "Indicates how a distributed variable will be aggregated". This is the part a bit more than "for convenience" (lots of code to replicate).

    • _set_hyper has a limitation: does not allow instantiating dtype. If add_weight approach in _create_hypers is desired with dtype, then it should be called directly.

When to use vs. not use: use if the attribute is used by the optimizer via TensorFlow ops - i.e. if it needs to be a tf.Variable. For example, epsilon is set regularly, as it's never needed as a tensor variable.



来源:https://stackoverflow.com/questions/62042342/what-are-get-hyper-and-set-hyper-in-tensorflow-optimizers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!