问题
How do you find an optimum learning rule for a given problem, say a multiple category classification?
I was thinking of using Genetic Algorithms, but I know there are issues surrounding performance. I am looking for real world examples where you have not used the textbook learning rules, and how you found those learning rules.
回答1:
Nice question BTW.
classification algorithms can be classified using many Characteristics like:
- What does the algorithm strongly prefer (or what type of data that is most suitable for this algorithm).
- training overhead. (does it take a lot of time to be trained)
- When is it effective. ( large data - medium data - small amount of data ).
- the complexity of analyses it can deliver.
Therefore, for your problem classifying multiple categories I will use Online Logistic Regression (FROM SGD) because it's perfect with small to medium data size (less than tens of millions of training examples) and it's really fast.
Another Example:
let's say that you have to classify a large amount of text data. then Naive Bayes is your baby. because it strongly prefers text analysis. even that SVM and SGD are faster, and as I experienced easier to train. but these rules "SVM and SGD" can be applied when the data size is considered as medium or small and not large.
In general any data mining person will ask him self the four afomentioned points when he wants to start any ML or Simple mining project.
After that you have to measure its AUC, or any relevant, to see what have you done. because you might use more than just one classifier in one project. or sometimes when you think that you have found your perfect classifier, the results appear to be not good using some measurement techniques. so you'll start to check your questions again to find where you went wrong.
Hope that I helped.
回答2:
When you input a vector x
to the net, the net will give an output depend on all the weights (vector w
). There would be an error between the output and the true answer. The average error (e
) is a function of the w
, let's say e = F(w)
. Suppose you have one-layer-two-dimension network, then the image of F
may look like this:
When we talk about training, we are actually talking about finding the w
which makes the minimal e
. In another word, we are searching the minimum of a function. To train is to search.
So, you question is how to choose the method to search. My suggestion would be: It depends on how the surface of F(w)
looks like. The wavier it is, the more randomized method should be used, because the simple method based on gradient descending would have bigger chance to guide you trapped by a local minimum - so you lose the chance to find the global minimum. On the another side, if the suface of F(w)
looks like a big pit, then forget the genetic algorithm. A simple back propagation or anything based on gradient descending would be very good in this case.
You may ask that how can I know how the surface look like? That's a skill of experience. Or you might want to randomly sample some w
, and calculate F(w)
to get an intuitive view of the surface.
来源:https://stackoverflow.com/questions/14176949/finding-an-optimum-learning-rule-for-an-ann