lda

Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

Deadly 提交于 2020-08-27 06:31:55
问题 I need to process the topics in the LDA output (lda.show_topics(num_topics=-1, num_words=100...) and then compare what I do with the pyLDAvis graph but the topic numbers are differently numbered. Is there a way I can match them? 回答1: If it's still relevant, have a look at the documentation http://pyldavis.readthedocs.io/en/latest/modules/API.html You may want to set sort_topics to False . This way the order of topics in gensim and pyLDAvis will be the same. At the same time, gensim's indexing

How do i measure perplexity scores on a LDA model made with the textmineR package in R?

ぃ、小莉子 提交于 2020-07-09 05:53:10
问题 I've made a LDA topic model in R, using the textmineR package, it looks as follows. ## get textmineR dtm dtm2 <- CreateDtm(doc_vec = dat2$fulltext, # character vector of documents ngram_window = c(1, 2), doc_names = dat2$names, stopword_vec = c(stopwords::stopwords("da"), custom_stopwords), lower = T, # lowercase - this is the default value remove_punctuation = T, # punctuation - this is the default remove_numbers = T, # numbers - this is the default verbose = T, cpus = 4) dtm2 <- dtm2[,

A practical example of GSDMM in python?

北战南征 提交于 2020-06-26 06:16:53
问题 I want to use GSDMM to assign topics to some tweets in my data set. The only examples I found (1 and 2) are not detailed enough. I was wondering if you know of a source (or care enough to make a small example) that shows how GSDMM is implemented using python. 回答1: GSDMM (Gibbs Sampling Dirichlet Multinomial Mixture) is a short text clustering model. It is essentially a modified LDA (Latent Drichlet Allocation) which suppose that a document such as a tweet or any other text encompasses one

probabilities returned by gensim's get_document_topics method doesn't add up to one

落爺英雄遲暮 提交于 2020-06-12 05:14:26
问题 Sometimes it returns probabilities for all topics and all is fine, but sometimes it returns probabilities for just a few topics and they don't add up to one, it seems it depends on the document. Generally when it returns few topics, the probabilities add up to more or less 80%, so is it returning just the most relevant topics? Is there a way to force it to return all probabilities? Maybe I'm missing something but I can't find any documentation of the method's parameters. 回答1: I had the same

gensim.interfaces.TransformedCorpus - How use?

久未见 提交于 2020-04-10 03:33:52
问题 I'm relative new in the world of Latent Dirichlet Allocation. I am able to generate a LDA Model following the Wikipedia tutorial and I'm able to generate a LDA model with my own documents. My step now is try understand how can I use a previus generated model to classify unseen documents. I'm saving my "lda_wiki_model" with id2word =gensim.corpora.Dictionary.load_from_text('ptwiki_wordids.txt.bz2') mm = gensim.corpora.MmCorpus('ptwiki_tfidf.mm') lda = gensim.models.ldamodel.LdaModel(corpus=mm,

gensim.interfaces.TransformedCorpus - How use?

雨燕双飞 提交于 2020-04-10 03:32:49
问题 I'm relative new in the world of Latent Dirichlet Allocation. I am able to generate a LDA Model following the Wikipedia tutorial and I'm able to generate a LDA model with my own documents. My step now is try understand how can I use a previus generated model to classify unseen documents. I'm saving my "lda_wiki_model" with id2word =gensim.corpora.Dictionary.load_from_text('ptwiki_wordids.txt.bz2') mm = gensim.corpora.MmCorpus('ptwiki_tfidf.mm') lda = gensim.models.ldamodel.LdaModel(corpus=mm,