topic-modeling

Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

Deadly 提交于 2020-08-27 06:31:55
问题 I need to process the topics in the LDA output (lda.show_topics(num_topics=-1, num_words=100...) and then compare what I do with the pyLDAvis graph but the topic numbers are differently numbered. Is there a way I can match them? 回答1: If it's still relevant, have a look at the documentation http://pyldavis.readthedocs.io/en/latest/modules/API.html You may want to set sort_topics to False . This way the order of topics in gensim and pyLDAvis will be the same. At the same time, gensim's indexing

Are there any R packages or published code on topic models that account for time?

对着背影说爱祢 提交于 2020-08-03 07:31:28
问题 I am trying to perform topic modeling on a data set of political speeches that spans 2 centuries, and would ideally like to use a topic model that accounts for time, such as Topics over Time (McCallum and Wang 2006) or the Dynamic Topic model (Blei and Lafferty 2006). However, given that I am not an experienced coder, the help of an R package or some sample code implementing either of these topic models would really help. Does anyone know if such packages or published code exists for R? I

Are there any R packages or published code on topic models that account for time?

那年仲夏 提交于 2020-08-03 07:29:52
问题 I am trying to perform topic modeling on a data set of political speeches that spans 2 centuries, and would ideally like to use a topic model that accounts for time, such as Topics over Time (McCallum and Wang 2006) or the Dynamic Topic model (Blei and Lafferty 2006). However, given that I am not an experienced coder, the help of an R package or some sample code implementing either of these topic models would really help. Does anyone know if such packages or published code exists for R? I

A practical example of GSDMM in python?

北战南征 提交于 2020-06-26 06:16:53
问题 I want to use GSDMM to assign topics to some tweets in my data set. The only examples I found (1 and 2) are not detailed enough. I was wondering if you know of a source (or care enough to make a small example) that shows how GSDMM is implemented using python. 回答1: GSDMM (Gibbs Sampling Dirichlet Multinomial Mixture) is a short text clustering model. It is essentially a modified LDA (Latent Drichlet Allocation) which suppose that a document such as a tweet or any other text encompasses one

Dynamic topic models/topic over time in R [closed]

喜欢而已 提交于 2020-06-25 06:54:11
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago . Improve this question I have a database of newspaper articles about the water policy from 1998 to 2008. I would like to see how the newspaper release changes during this period. My question is, should I use Dynamic Topic Modeling or Topic Over Time model to handle this task? Would

probabilities returned by gensim's get_document_topics method doesn't add up to one

落爺英雄遲暮 提交于 2020-06-12 05:14:26
问题 Sometimes it returns probabilities for all topics and all is fine, but sometimes it returns probabilities for just a few topics and they don't add up to one, it seems it depends on the document. Generally when it returns few topics, the probabilities add up to more or less 80%, so is it returning just the most relevant topics? Is there a way to force it to return all probabilities? Maybe I'm missing something but I can't find any documentation of the method's parameters. 回答1: I had the same

How to properly encode UTF-8 txt files for R topic model

二次信任 提交于 2020-04-30 09:27:18
问题 Similar issues have been discussed on this forum (e.g. here and here), but I have not found the one that solves my problem, so I apologize for a seemingly similar question. I have a set of .txt files with UTF-8 encoding (see the screenshot). I am trying to run a topic model in R using tm package. However, despite using encoding = "UTF-8" when creating the corpus, I get obvious problems with encoding. For instance, I get < U+FB01 >scal instead of fiscal , in< U+FB02>uenc instead of influence ,

How to understand the output of Topic Model class in Mallet?

可紊 提交于 2020-02-26 06:36:22
问题 As I'm trying out the examples code on topic modeling developer's guide, I really want to understand the meaning of the output of that code. First during the running process, it gives out: Coded LDA: 10 topics, 4 topic bits, 1111 topic mask max tokens: 148 total tokens: 1333 <10> LL/token: -9,24097 <20> LL/token: -9,1026 <30> LL/token: -8,95386 <40> LL/token: -8,75353 0 0,5 battle union confederate tennessee american states 1 0,5 hawes sunderland echo war paper commonwealth 2 0,5 test

Topic Modelling by Group using LDA in R

你。 提交于 2020-02-01 09:37:14
问题 I am stuck at one problem. I am trying to categorize sentences into topics using LDA. I have done it, however the problem is: LDA is working on whole dataset and giving me topic terminologies across the dataset. I want to get the topic terminologies by group in Dataset. So my data looks like this: Comment Division Smooth execution of Regional Administration in my absence. Well done. Finance Job well done in completing CPs and making the facility available well in time. Finance Good Job