What is difference between SVM and Neural Network? Is it true that linear svm is same NN, and for non-linear separable problems, NN uses adding hidden layers and SVM uses changi
Running a simple out-of-the-box comparison between support vector machines and neural networks (WITHOUT any parameter-selection) on several popular regression and classification datasets demonstrates the practical differences: an SVM becomes a very slow predictor if many support vectors are being created while a neural network's prediction speed is much higher and model-size much smaller. On the other hand, the training time is much shorter for SVMs. Concerning the accuracy/loss - despite the aforementioned theoretical drawbacks of neural networks - both methods are on par - especially for regression problems, neural networks often outperform support vector machines. Depending on your specific problem, this might help to choose the right model.
There are two parts to this question. The first part is "what is the form of function learned by these methods?" For NN and SVM this is typically the same. For example, a single hidden layer neural network uses exactly the same form of model as an SVM. That is:
Given an input vector x, the output is: output(x) = sum_over_all_i weight_i * nonlinear_function_i(x)
Generally the nonlinear functions will also have some parameters. So these methods need to learn how many nonlinear functions should be used, what their parameters are, and what the value of all the weight_i weights should be.
Therefore, the difference between a SVM and a NN is in how they decide what these parameters should be set to. Usually when someone says they are using a neural network they mean they are trying to find the parameters which minimize the mean squared prediction error with respect to a set of training examples. They will also almost always be using the stochastic gradient descent optimization algorithm to do this. SVM's on the other hand try to minimize both training error and some measure of "hypothesis complexity". So they will find a set of parameters that fits the data but also is "simple" in some sense. You can think of it like Occam's razor for machine learning. The most common optimization algorithm used with SVMs is sequential minimal optimization.
Another big difference between the two methods is that stochastic gradient descent isn't guaranteed to find the optimal set of parameters when used the way NN implementations employ it. However, any decent SVM implementation is going to find the optimal set of parameters. People like to say that neural networks get stuck in a local minima while SVMs don't.
NNs are heuristic, while SVMs are theoretically founded. A SVM is guaranteed to converge towards the best solution in the PAC (probably approximately correct) sense. For example, for two linearly separable classes SVM will draw the separating hyperplane directly halfway between the nearest points of the two classes (these become support vectors). A neural network would draw any line which separates the samples, which is correct for the training set, but might not have the best generalization properties.
So no, even for linearly separable problems NNs and SVMs are not same.
In case of linearly non-separable classes, both SVMs and NNs apply non-linear projection into higher-dimensional space. In the case of NNs this is achieved by introducing additional neurons in the hidden layer(s). For SVMs, a kernel function is used to the same effect. A neat property of the kernel function is that the computational complexity doesn't rise with the number of dimensions, while for NNs it obviously rises with the number of neurons.
Both Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) are supervised machine learning classifiers. An ANN is a parametric classifier that uses hyper-parameters tuning during the training phase. An SVM is a non-parametric classifier that finds a linear vector (if a linear kernel is used) to separate classes. Actually, in terms of the model performance, SVMs are sometimes equivalent to a shallow neural network architecture. Generally, an ANN will outperform an SVM when there is a large number of training instances, however, neither outperforms the other over the full range of problems.
We can summarize the advantages of the ANN over the SVM as follows: ANNs can handle multi-class problems by producing probabilities for each class. In contrast, SVMs handle these problems using independent one-versus-all classifiers where each produces a single binary output. For example, a single ANN can be trained to solve the hand-written digits problem while 10 SVMs (one for each digit) are required.
Another advantage of ANNs, from the perspective of model size, is that the model is fixed in terms of its inputs nodes, hidden layers, and output nodes; in an SVM, however, the number of support vector lines could reach the number of instances in the worst case.
The SVM does not perform well when the number of features is greater than the number of samples. More work in feature engineering is required for an SVM than that needed for a multi-layer Neural Network.
On the other hand, SVMs are better than ANNs in certain respects:
In comparison to SVMs, ANNs are more prone to becoming trapped in local minima, meaning that they sometime miss the global picture.
While most machine learning algorithms can overfit if they don’t have enough training samples, ANNs can also overfit if training goes on for too long - a problem that SVMs do not have.
SVM models are easier to understand. There are different kernels that provide a different level of flexibilities beyond the classical linear kernel, such as the Radial Basis Function kernel (RBF). Unlike the linear kernel, the RBF can handle the case when the relation between class labels and attributes is nonlinear.
Actually, they are exactly equivalent to each other. The only difference is in their standard implementations with selections of activation function and regularization etc, which obviously differ from each other. Also, I have yet not seen a dual formulation for neural networks, but SVMs are moving toward the primal anyway.
Practically, most of your assumption are often quite true. I'll elaborate: for linear separable classes Linear SVM works quite good and and it's much faster to train. For non linear classes there is the kernel trick, which is sending your data to a higher dimension space. This trick however has two disadvantages compared to NN. First - your have to search for the right parameters , because the classifier will only work if in the higher dimension the two sets will be linearly separable. Now - testing parameters is often done by grid search which is CPU-time consuming. The other problem is that this whole technique is not as general as NN (for example, for NLP if often results in poor classifier).