Dynamic topic models/topic over time in R [closed]

喜欢而已 提交于 2020-06-25 06:54:11

问题


I have a database of newspaper articles about the water policy from 1998 to 2008. I would like to see how the newspaper release changes during this period. My question is, should I use Dynamic Topic Modeling or Topic Over Time model to handle this task? Would they be significantly better than the traditional LDA model (in which I fit the topic model base on the entire set of text corpus, and plot the trend of topic based on how each of the document is tagged)? If yes, is there a package I could use for the DTA/ToT model in R?


回答1:


So it depends on what your research question is.

A dynamic topic model allows the words that are most strongly associated with a given topic to vary over time. The paper that introduces the model gives a great example of this using journal entries [1]. If you are interested in whether the characteristics of individual topics vary over time, then this is the correct approach.

I have not dealt with the ToT model before, but it appears similar to a structural topic model whose time covariates are continuous. This means that topics are fixed, but their relative prevalence and correlations can vary. If you group your articles into say - months - then a structural or ToT model can show you whether certain topics become more or less prevalent over time.

So in sum, do you want the variation to be within topics or between topics? Do you want to study how the articles vary in the topics they speak on, or do you want to study how these articles construct certain topics?

In terms of R, you'll run into some problems. The stm package can deal with a STM with discrete time periods, but there is no pre-packaged implementation of a ToT model that I am aware of. For a DTM, I know there is a C++ implementation that was released with the introductory paper, and I have a python version which I can find for you.

Note: I would never recommend someone to use a simple LDA for text documents. I would always take a correlated topic model as a base, and build from there.

Edit: to explain more on stm package.

This package is an implementation of the structural topic model [2]. The STM is an extension to the correlated topic model [3] but permits the inclusion of covariates at the document level. You can then explore the relationship between topic prevalence and these covariates. If you include a covariate for date, then you can explore how individual topics become more or less important over time, relative to others. The package itself is excellent, fast and intuitive, and includes functions to choose the most appropriate number of topics etc.

[1] Blei, David M., and John D. Lafferty. "Dynamic topic models." Proceedings of the 23rd international conference on Machine learning. ACM, 2006.

[2] Roberts, Margaret E., et al. "Structural Topic Models for Open‐Ended Survey Responses." American Journal of Political Science 58.4 (2014): 1064-1082.

[3] Lafferty, John D., and David M. Blei. "Correlated topic models." Advances in neural information processing systems. 2006.



来源:https://stackoverflow.com/questions/47952931/dynamic-topic-models-topic-over-time-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!