How to abstract bigram topics instead of unigrams using Latent Dirichlet Allocation (LDA) in python- gensim?

前端 未结 2 1562
猫巷女王i
猫巷女王i 2021-02-06 14:32

LDA Original Output

  • Uni-grams

    • topic1 -scuba,water,vapor,diving

    • topic2 -dioxide,plants,green,carbon

2条回答
  •  难免孤独
    2021-02-06 15:08

    You can use word2vec to get most similar terms from the top n topics abstracted using LDA.

    LDA Output

    Create a dictionary of bi-grams using topics abstracted (for ex:-san_francisco)

    check http://www.markhneedham.com/blog/2015/02/12/pythongensim-creating-bigrams-over-how-i-met-your-mother-transcripts/

    Then, do word2vec to get most similar words (uni-grams,bi-grams etc)

    Word and Cosine distance

    los_angeles (0.666175)
    golden_gate (0.571522)
    oakland (0.557521)

    check https://code.google.com/p/word2vec/ (From words to phrases and beyond)

提交回复
热议问题