The Adam optimizer has several terms that are used to add "momentum" to the gradient descent algorithm, making the step size for each variable adaptive:
Specifi