Suppose we have weights
x = tf.Variable(np.random.random((5,10)))
cost = ...
And we use the GD optimizer:
upds = tf.train.G
You can take the Lagrangian approach and simply add a penalty for features of the variable you don't want.
e.g. To encourage theta
to be non-negative, you could add the following to the optimizer's objective function.
added_loss = -tf.minimum( tf.reduce_min(theta),0)
If any theta
are negative, then add2loss will be positive, otherwise zero. Scaling that to a meaningful value is left as an exercise to the reader. Scaling too little will not exert enough pressure. Too much may make things unstable.