问题
I implemented a network with TensorFlow and created the model doing the following in my code:
def multilayer_perceptron(x, weights, biases):
layer_1 = tf.add(tf.matmul(x, weights["h1"]), biases["b1"])
layer_1 = tf.nn.relu(layer_1)
out_layer = tf.add(tf.matmul(layer_1, weights["out"]), biases["out"])
return out_layer
I initialize the weights and the biases doing:
weights = {
"h": tf.Variable(tf.random_normal([n_input, n_hidden_1])),
"out": tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
}
biases = {
"b": tf.Variable(tf.random_normal([n_hidden_1])),
"out": tf.Variable(tf.random_normal([n_classes]))
}
Now I want to use a custom activation function. Therefore I replaced tf.nn.relu(layer_1)
with a custom activation function custom_sigmoid(layer_1)
which is defined as:
def custom_sigmoid(x):
beta = tf.Variable(tf.random.normal(x.get_shape[1]))
return tf.sigmoid(beta*x)
Where beta
is a trainable parameter. I realized that this can not work since I don't know how to implement the derivative such that TensorFlow can use it.
Question: How can I use a custom activation function in TensorFlow? I would really appreciate any help.
回答1:
That's the beauty of automatic differentiation! You don't need to know how to compute the derivative of your function as long as you use all tensorflow constructs that are inherently differentiable (there are some functions that simply are non-differentiable functions in tensorflow).
For everything else the derivative is computed for you by tensorflow, any combination of operations that are inherently differentiable can be used and you never need to think about the gradient. Validate it by using tf.graidents
in a test case to show that tensorflow is computing the gradient with respect to your cost function.
Here's a really nice explanation of automatic differentiation for the curious:
https://alexey.radul.name/ideas/2013/introduction-to-automatic-differentiation/
You can make sure that beta is a trainable parameter by checking that it exists in the collection tf.GraphKeys.TRAINABLE_VARIABLES
, this means that the optimizer will compute its derivative w.r.t. the cost and update it (if it's not in that collection you should investigate).
回答2:
I try to answer my own question. Here is what I did and what seems to work:
First I define a custom activation function:
def custom_sigmoid(x, beta_weights):
return tf.sigmoid(beta_weights*x)
Then I create weights for the activation function:
beta_weights = {
"beta1": tf.Variable(tf.random_normal([n_hidden_1]))
}
Finally I add beta_weights
to my model function and replace the activation function in multilayer_perceptron()
:
def multilayer_perceptron(x, weights, biases, beta_weights):
layer_1 = tf.add(tf.matmul(x, weights["h1"]), biases["b1"])
#layer_1 = tf.nn.relu(layer_1) # Old
layer_1 = custom_sigmoid(x, beta_weights["beta1"]) # New
out_layer = tf.add(tf.matmul(layer_1, weights["out"]), biases["out"])
return out_layer
来源:https://stackoverflow.com/questions/49923958/tensorflow-custom-activation-function