how to specify random_state in LDA model for topic modelling

问题

I read the gensim LDA model documentation about random_state which states that:

random_state ({np.random.RandomState, int}, optional)

– Either a randomState object or a seed to generate one. Useful for reproducibility.

I have been tring put random_state=42 or

random_seed=42
state=np.random.RandomState(random_seed)
state.randn(1)
random_state=state.randn(1)

which did not work. Can anyone suggest what should i do

model=ldaModel(corpus=corpus, id2word=dictionary, num_topics=num_topics, random_state=None)

I tied to use it without random_state the function it works but with random_state i got error message saying LDA model is not defined

def compute_coherence_values(dictionary, corpus, texts, limit, random_state, start=2, step=3):

coherence_values = []
model_list = []
for num_topics in range(start, limit, step):
    #model=LdaModel(corpus=corpus, id2word=dictionary, num_topics=num_topics)
    model=ldaModel(corpus=corpus, id2word=dictionary, num_topics=num_topics, 
                                                  random_state)
    model_list.append(model)
    coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')
    coherence_values.append(coherencemodel.get_coherence())

return model_list, coherence_values

回答1:

The mistake in your code is in here:

 model=ldaModel(corpus=corpus, id2word=dictionary, num_topics=num_topics, 
                                                  random_state)

You can't just pass the variable random_state without specifying the label. Just passing the variable to the method with an int number means nothing to the ldaModel method, since the method does not take positional parameter. The method takes named parameters. So it should be like this:

model=ldaModel(corpus=corpus, id2word=dictionary, num_topics=num_topics, 
                                                  random_state = random_state)

I have an implementation of the LDA that uses LatentDirichletAllocation from sklearn.decomposition, and for the random_state it takes an integer. Here is an example:

lda_model = LatentDirichletAllocation(n_components=10,        
                                  max_iter=10,               
                                  learning_method='online',   
                                  random_state=100,          
                                  batch_size=128,            
                                  evaluate_every = -1,       
                                  n_jobs = -1 )

Here is a good tutorial on how to implement and LDA

来源：https://stackoverflow.com/questions/61373994/how-to-specify-random-state-in-lda-model-for-topic-modelling

标签

python

text

model

word