lda | 易学教程

pyLDAvis: Validation error on trying to visualize topics

阅读更多关于 pyLDAvis: Validation error on trying to visualize topics

来源： https://stackoverflow.com/questions/47998685/pyldavis-validation-error-on-trying-to-visualize-topics

how to improve word assignement in different topics in lda

阅读更多关于 how to improve word assignement in different topics in lda

来源： https://stackoverflow.com/questions/45822801/how-to-improve-word-assignement-in-different-topics-in-lda

how to improve word assignement in different topics in lda

阅读更多关于 how to improve word assignement in different topics in lda

来源： https://stackoverflow.com/questions/45822801/how-to-improve-word-assignement-in-different-topics-in-lda

how to improve word assignement in different topics in lda

阅读更多关于 how to improve word assignement in different topics in lda

来源： https://stackoverflow.com/questions/45822801/how-to-improve-word-assignement-in-different-topics-in-lda

Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

阅读更多关于 Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

问题 I need to process the topics in the LDA output (lda.show_topics(num_topics=-1, num_words=100...) and then compare what I do with the pyLDAvis graph but the topic numbers are differently numbered. Is there a way I can match them? 回答1: If it's still relevant, have a look at the documentation http://pyldavis.readthedocs.io/en/latest/modules/API.html You may want to set sort_topics to False . This way the order of topics in gensim and pyLDAvis will be the same. At the same time, gensim's indexing

How do i measure perplexity scores on a LDA model made with the textmineR package in R?

阅读更多关于 How do i measure perplexity scores on a LDA model made with the textmineR package in R?

问题 I've made a LDA topic model in R, using the textmineR package, it looks as follows. ## get textmineR dtm dtm2 <- CreateDtm(doc_vec = dat2$fulltext, # character vector of documents ngram_window = c(1, 2), doc_names = dat2$names, stopword_vec = c(stopwords::stopwords("da"), custom_stopwords), lower = T, # lowercase - this is the default value remove_punctuation = T, # punctuation - this is the default remove_numbers = T, # numbers - this is the default verbose = T, cpus = 4) dtm2 <- dtm2[,

A practical example of GSDMM in python?

阅读更多关于 A practical example of GSDMM in python?

问题 I want to use GSDMM to assign topics to some tweets in my data set. The only examples I found (1 and 2) are not detailed enough. I was wondering if you know of a source (or care enough to make a small example) that shows how GSDMM is implemented using python. 回答1: GSDMM (Gibbs Sampling Dirichlet Multinomial Mixture) is a short text clustering model. It is essentially a modified LDA (Latent Drichlet Allocation) which suppose that a document such as a tweet or any other text encompasses one

probabilities returned by gensim's get_document_topics method doesn't add up to one

阅读更多关于 probabilities returned by gensim's get_document_topics method doesn't add up to one

问题 Sometimes it returns probabilities for all topics and all is fine, but sometimes it returns probabilities for just a few topics and they don't add up to one, it seems it depends on the document. Generally when it returns few topics, the probabilities add up to more or less 80%, so is it returning just the most relevant topics? Is there a way to force it to return all probabilities? Maybe I'm missing something but I can't find any documentation of the method's parameters. 回答1: I had the same

gensim.interfaces.TransformedCorpus - How use?

阅读更多关于 gensim.interfaces.TransformedCorpus - How use?

问题 I'm relative new in the world of Latent Dirichlet Allocation. I am able to generate a LDA Model following the Wikipedia tutorial and I'm able to generate a LDA model with my own documents. My step now is try understand how can I use a previus generated model to classify unseen documents. I'm saving my "lda_wiki_model" with id2word =gensim.corpora.Dictionary.load_from_text('ptwiki_wordids.txt.bz2') mm = gensim.corpora.MmCorpus('ptwiki_tfidf.mm') lda = gensim.models.ldamodel.LdaModel(corpus=mm,

gensim.interfaces.TransformedCorpus - How use?

阅读更多关于 gensim.interfaces.TransformedCorpus - How use?