Hierarchical Dirichlet Process Gensim topic number independent of corpus size

前端 未结 7 1713
余生分开走
余生分开走 2021-02-04 07:20

I am using the Gensim HDP module on a set of documents.

>>> hdp = models.HdpModel(corpusB, id2word=dictionaryB)
>>> topics = hdp.print_topics(         


        
7条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-02-04 07:46

    I haven't used gensim for HDPs, but is it possible that most of the topics in the smaller corpus have extremely low probability of occurring ? Can you trying printing the topic probabilities? Maybe, the length of the topics array doesn't necessarily mean that all those topics were actually found in the corpus.

提交回复
热议问题