问题
I am attempting to create a custom, Dense layer in Keras to tie weights in an Autoencoder. I have tried following an example for doing this in convolutional layers here, but it seemed like some of the steps did not apply for the Dense layer (also, the code is from over two years ago).
By tying weights, I want the decode layer to use the transposed weight matrix of the encode layer. This approach is also taken in this article (page 5). Below is the relevant quote from the article:
Here, we choose both the encoding and decoding activation function to be sigmoid function and only consider the tied weights case, in which W ′ = WT (where WT is the transpose of W ) as most existing deep learning methods do.
In the quote above, W is the weight matrix in the encode layer and W' (equal to the transpose of W) is the weight matrix in the decode layer.
I did not change too much in the dense layer. I added a tied_to
parameter to the constructor, which allows you to pass the layer you want to tie it to. The only other change was to the build
function, the snippet for this is below:
def build(self, input_shape):
assert len(input_shape) >= 2
input_dim = input_shape[-1]
if self.tied_to is not None:
self.kernel = K.transpose(self.tied_to.kernel)
self._non_trainable_weights.append(self.kernel)
else:
self.kernel = self.add_weight(shape=(input_dim, self.units),
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
if self.use_bias:
self.bias = self.add_weight(shape=(self.units,),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dim})
self.built = True
Below is the __init__
method, the only change here was the addition of the tied_to
parameter.
def __init__(self, units,
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
tied_to=None,
**kwargs):
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super(Dense, self).__init__(**kwargs)
self.units = units
self.activation = activations.get(activation)
self.use_bias = use_bias
self.kernel_initializer = initializers.get(kernel_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.bias_constraint = constraints.get(bias_constraint)
self.input_spec = InputSpec(min_ndim=2)
self.supports_masking = True
self.tied_to = tied_to
The call
function was not edited, but it is below for reference.
def call(self, inputs):
output = K.dot(inputs, self.kernel)
if self.use_bias:
output = K.bias_add(output, self.bias, data_format='channels_last')
if self.activation is not None:
output = self.activation(output)
return output
Above, I added a conditional to check if the tied_to
parameter was set, and if so, set the layer's kernel to the transpose of the tied_to
layer's kernel.
Below is the code used to instantiate the model. It is done using Keras's sequential API and DenseTied
is my custom layer.
# encoder
#
encoded1 = Dense(2, activation="sigmoid")
decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1)
# autoencoder
#
autoencoder = Sequential()
autoencoder.add(encoded1)
autoencoder.add(decoded1)
After training the model, below is the model summary and weights.
autoencoder.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_7 (Dense) (None, 2) 10
_________________________________________________________________
dense_tied_7 (DenseTied) (None, 4) 12
=================================================================
Total params: 22
Trainable params: 14
Non-trainable params: 8
________________________________________________________________
autoencoder.layers[0].get_weights()[0]
array([[-2.122982 , 0.43029135],
[-2.1772149 , 0.16689162],
[-1.0465667 , 0.9828905 ],
[-0.6830663 , 0.0512633 ]], dtype=float32)
autoencoder.layers[-1].get_weights()[1]
array([[-0.6521988 , -0.7131109 , 0.14814234, 0.26533198],
[ 0.04387903, -0.22077179, 0.517225 , -0.21583867]],
dtype=float32)
As you can see, the weights reported by autoencoder.get_weights()
do not seem to be tied.
So after showing my approach, my question is, is this a valid way to tie weights in a Dense Keras layer? I was able to run the code, and it is currently training. It seems that the loss function is decreasing reasonably as well. My fear is that this will only set them equal when the model is build, but not actually tie them. My hope is that the backend transpose
function is tying them through references under the hood, but I am sure that I am missing something.
回答1:
So after showing my approach, my question is, is this a valid way to tie weights in a Dense Keras layer?
Yes, it's valid.
My fear is that this will only set them equal when the model is build, but not actually tie them. My hope is that the backend transpose function is tying them through references under the hood, but I am sure that I am missing something.
It actually ties them in a computation graph, you can check in printing model.summary()
that there's just one copy of these trainable weights. Also, after training your model you can check weights of corresponding layers with model.get_weights()
. When the model is build there're no weights yet actually, just placeholders for them.
random.seed(1)
class DenseTied(Layer):
def __init__(self, units,
activation=None,
use_bias=True,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
tied_to=None,
**kwargs):
self.tied_to = tied_to
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super().__init__(**kwargs)
self.units = units
self.activation = activations.get(activation)
self.use_bias = use_bias
self.kernel_initializer = initializers.get(kernel_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.bias_constraint = constraints.get(bias_constraint)
self.input_spec = InputSpec(min_ndim=2)
self.supports_masking = True
def build(self, input_shape):
assert len(input_shape) >= 2
input_dim = input_shape[-1]
if self.tied_to is not None:
self.kernel = K.transpose(self.tied_to.kernel)
self._non_trainable_weights.append(self.kernel)
else:
self.kernel = self.add_weight(shape=(input_dim, self.units),
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
if self.use_bias:
self.bias = self.add_weight(shape=(self.units,),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
self.built = True
def compute_output_shape(self, input_shape):
assert input_shape and len(input_shape) >= 2
assert input_shape[-1] == self.units
output_shape = list(input_shape)
output_shape[-1] = self.units
return tuple(output_shape)
def call(self, inputs):
output = K.dot(inputs, self.kernel)
if self.use_bias:
output = K.bias_add(output, self.bias, data_format='channels_last')
if self.activation is not None:
output = self.activation(output)
return output
# input_ = Input(shape=(16,), dtype=np.float32)
# encoder
#
encoded1 = Dense(4, activation="sigmoid", input_shape=(4,), use_bias=True)
decoded1 = DenseTied(4, activation="sigmoid", tied_to=encoded1, use_bias=False)
# autoencoder
#
autoencoder = Sequential()
# autoencoder.add(input_)
autoencoder.add(encoded1)
autoencoder.add(decoded1)
autoencoder.compile(optimizer="adam", loss="binary_crossentropy")
print(autoencoder.summary())
autoencoder.fit(x=np.random.rand(100, 4), y=np.random.randint(0, 1, size=(100, 4)))
print(autoencoder.layers[0].get_weights()[0])
print(autoencoder.layers[1].get_weights()[0])
回答2:
Thanks Mikhail Berlinkov, One imporant remark: This code runs under Keras, but not in eager mode in TF2.0. It runs, but it trains badly.
The critical point is, how the object stores the transposed weight. self.kernel = K.transpose(self.tied_to.kernel)
In non eager mode this creates a graph the right way. In eager mode this fails, probably because the value of a transposed variable is stored at build (== the first call), and then used at subsequent calls.
However: the solution is to store the variable unaltered at build, and put the transpose operation into the call method.
I spent several days to figure this out, and I am happy if this helps anyone.
来源:https://stackoverflow.com/questions/53751024/tying-autoencoder-weights-in-a-dense-keras-layer