What's different about momentum gradient update in Tensorflow and Theano like this?
I'm trying to use TensorFlow with my deep learning project. Here I need implement my gradient update in this formula : I have also implement this part in Theano, and it came out the expected answer. But when I try to use TensorFlow's MomentumOptimizer , the result is really bad. I don't know what is different between them. Theano: def gradient_updates_momentum_L2(cost, params, learning_rate, momentum, weight_cost_strength): # Make sure momentum is a sane value assert momentum < 1 and momentum >= 0 # List of update steps for each parameter updates = [] # Just gradient descent on cost for param