Topic Modeling: How do I use my fitted LDA model to predict new topics for a new dataset in R?

假如想象 提交于 2019-11-29 03:51:33

问题


I am using 'lda' package in R for topic modeling. I want to predict new topics(collection of related words in a document) using a fitted Latent Dirichlet Allocation(LDA) model for new dataset. In the process, I came across predictive.distribution() function. But the function takes document_sums as input parameter which is an output of the result after fitting the new model. I need help to understand the use of existing model on new dataset and predict topics. Here is the example code present in the documentation written by Johnathan Chang for the package: Here is the code for it:

#Fit a model
data(cora.documents)
data(cora.vocab)

K <- 10 ## Num clusters

result <- lda.collapsed.gibbs.sampler(cora.documents,K, cora.vocab,25, 0.1, 0.1) 

# Predict new words for the first two documents
predictions <-  predictive.distribution(result$document_sums[,1:2], result$topics, 0.1, 0.1)

# Use top.topic.words to show the top 5 predictions in each document.
top.topic.words(t(predictions), 5)

Any help will be appreciated

Thanks & Regards,

Ankit


回答1:


I don't know how you can achieve this in R but please have a look at a 2009 publication by Wallach et. al. titled 'Evaluation Methods for Topic Models' here. Have a look at section 4, it mentions three methods to calculate P(z|w), one based on importance sampling and other two called 'Chib-style estimator' and 'left-to-right estimator'.

Mallet has implementation of left-to-right estimator method



来源:https://stackoverflow.com/questions/10483349/topic-modeling-how-do-i-use-my-fitted-lda-model-to-predict-new-topics-for-a-new

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!