I have read LDA and I understand the mathematics of how the topics are generated when one inputs a collection of documents.
References say that LDA is an algorithm which
Splitting the data into training and testing sets is a common step in evaluating the performance of a learning algorithm. It's more clear-cut for supervised learning, wherein you train the model on the training set, then see how well its classifications on the test set match the true class labels. For unsupervised learning, such evaluation is a little trickier. In the case of topic modeling, a common measure of performance is perplexity. You train the model (like LDA) on the training set, and then you see how "perplexed" the model is on the testing set. More specifically, you measure how well the word counts of the test documents are represented by the word distributions represented by the topics.
Perplexity is good for relative comparisons between models or parameter settings, but it's numeric value doesn't really mean much. I prefer to evaluate topic models using the following, somewhat manual, evaluation process:
I realize that this process isn't as nice and quantitative as one might like, but to be honest, the applications of topic models are rarely quantitative either. I suggest evaluating your topic model according to the problem you're applying it to.
Good luck!