In supervised learning I have the typical train/test split to learn the algorithm, e.g. Regression or Classification. Regarding unsupervised learning, my question is: Is train/test split necessary and useful? If yes, why?
Well This Depend on the Problem, the form of dataset and Class of Unsupervised algorithm used to solve the particular problem.
Roughly:- Dimensionality reduction techniques are usually tested by calculating the error in reconstruction so there we can use k-fold cross-validation procedure
But on clustering algorithm, I would suggest doing statistical testing in order to test performance. There is also little time-consuming trick which splitting dataset and hand label the test set with meaningfull classes and cross validate
In any case unsupervised algorithm is used on supervised data then it always good cross-validate
overall:- It is not necessary to split data in the train-test set but if we can do it it is always better
Here is article which explains how cross-validation is a good tool for unsupervised learning http://udini.proquest.com/view/cross-validation-for-unsupervised-pqid:1904931481/ and the full text is available here http://arxiv.org/pdf/0909.3052.pdf
来源:https://stackoverflow.com/questions/31673388/is-train-test-split-in-unsupervised-learning-necessary-useful