I\'m new to machine learning and neural networks. I know how to build a nonlinear classification model, but my current problem has a continuous output. I\'ve been searching for
I don't want to be impolite, but the current answers are all related to nonlinear ND-polynomials resulting from linear activation functions. That simply doesn't make sense in terms of this question.
I get the point because you will have a polynomial as the objective function to minimize with coefficients that are products of layer coefficients and a product is nonlinear. Anyway, such a system will never be able to converge and doesn't make sense at all without any extra constraints.
The described system is not only completely unnecessarily nonlinear, but also ill-posed. Don't argue about stuff that leads ad absurdum. The original question actually completely nailed it.
Build a "linear neural network" with layers and try to use it as usual... then you will realise that this goes nowhere and you wasted your time. So unless there is good reasons to believe this kind of ill-posed stuff has been handled I would never ever consider using a linear activation function. If you have extra constraints this might make sense. If you use stochastic gradient descent then you will at least skip some bad properties of it.
That the objective function is nonlinear in its parameters gives an impression that is wrong and bogus. And if the writer would have known about optimisation problems connected to terms with a product of coefficients he would have never written anything like this.
Any objective function can be made nonlinear. If you just replace one linear coefficient with a product of two coefficients. But that is nonsense because you can never determine those coefficients. NEVER. There are infinitely many solutions! And that doesn't even depend on the amount of data.