Understanding parameters in Gensim LDA Model

后端 未结 1 434
礼貌的吻别
礼貌的吻别 2021-01-05 10:09

I am using gensim.models.ldamodel.LdaModel to perform LDA, but I do not understand some of the parameters and cannot find explanations in the documentation. If

相关标签:
1条回答
  • 2021-01-05 10:59

    I wonder if you have seen this page?

    Either way, let me explain a few things for you. The number of documents you use is small for the method (it works much better when trained on a data source of the size of Wikipedia). Therefore the results will be rather crude and you have to be aware of that. This is why you should not aim for a large number of topics (you chose 10 which could maybe go sensibly up to 20 in your case).

    As for the other parameters:

    • random_state - this serves as a seed (in case you wanted to repeat exactly the training process)

    • chunksize - number of documents to consider at once (affects the memory consumption)

    • update_every - update the model every update_every chunksize chunks (essentially, this is for memory consumption optimization)

    • passes - how many times the algorithm is supposed to pass over the whole corpus

    • alpha - to cite the documentation:

      can be set to an explicit array = prior of your choice. It also support special values of `‘asymmetric’ and ‘auto’: the former uses a fixed normalized asymmetric 1.0/topicno prior, the latter learns an asymmetric prior directly from your data.

    • per_word_topics - setting this to True allows for extraction of the most likely topics given a word. The training process is set in such a way that every word will be assigned to a topic. Otherwise, words that are not indicative are going to be omitted. phi_value is another parameter that steers this process - it is a threshold for a word treated as indicative or not.

    Optimal training process parameters are described particularly well in M. Hoffman et al., Online Learning for Latent Dirichlet Allocation.

    For memory optimization of the training process or the model see this blog post.

    0 讨论(0)
提交回复
热议问题