LDA and topic model

匿名 (未验证) 提交于 2019-12-03 08:54:24

问题:

I have studied LDA and Topic model for several weeks.But due to my poor mathematics ability, i can not fully understand its inner algorithms.I have used the GibbsLDA implementation, input a lot of documents, and set topic number as 100, i got a file named "final.theta" which stores the topic proportion of each topic in each document.This result is good, i can use the topic proportion to do many other things. But when i tried Blei's C language implementation on LDA, i only got a file named final.gamma, but i don't know how to transform this file into topic proportion style. Can anyone help me. And i have learned that LDA model has many improved version(such as CTM,HLDA), if i can find a topic model similar to LDA, i mean when i input a lot of documents, it can directly output the topic proportion in the documents. Thank you very much!

回答1:

I think the problem with the Blei implementation is that you're doing variational inference by running:

$ lda inf [args...]

When you want to be doing topic estimation, with:

$ lda est [args...]

Once this runs, there will be a file "final.beta" in either the current directory or the directory specified by the optional last argument. Then you run the python script "topics.py", included in the tar. The readme here: http://www.cs.princeton.edu/~blei/lda-c/readme.txt describes it all, especially sections B and D.

(If this still doesn't make sense, let me know)

As far as improvements such as CTM etc: I don't know anything about HLDA, but I have used both LDA and CTM in the past, and I can say that neither is strictly better than the other - it's a case of being better for different data. CTM makes the assumption that documents are correlated, and uses that assumption to improve the results as long as it's true.

Hope this helps!



回答2:

To get E[θ] just normalize gammas within each row. It follows from the properties of Dirichlet distribution.



转载请标明出处:LDA and topic model
文章来源: LDA and topic model
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!