what is the benefit of using Gradient Descent in the linear regression space? looks like the we can solve the problem (finding theta0-n that minimum the cost func) with analytic
Other reason is that gradient descent is immediately useful when you generalize linear regression, especially if the problem doesn't have a closed-form solution, like for example in Lasso (which adds regularization term consisting on sum of absolute values of weight vector).