[![enter image description here][1]][1]
how to derive the update rule for minimizing this loss using stochastic gradient descent with step size