I am using the Gensim HDP module on a set of documents.
>>> hdp = models.HdpModel(corpusB, id2word=dictionaryB)
>>> topics = hdp.print_topics(
Deriving the average coherence of HDP topics from their coherence at the individual text level is a way to order (and potentially truncate) them. The following function does just that:
def order_subset_by_coherence(dirichlet_model, bow_corpus, num_topics=10, num_keywords=10):
"""
Orders topics based on their average coherence across the corpus
Parameters
----------
dirichlet_model : gensim.models.hdpmodel.HdpModel
bow_corpus : list of lists (contains (id, freq) tuples)
num_topics : int (default=10)
num_keywords : int (default=10)
Returns
-------
ordered_topics: list of lists containing topic tokens
"""
shown_topics = dirichlet_model.show_topics(num_topics=150, # return all topics
num_words=num_keywords,
formatted=False)
model_topics = [[word[0] for word in topic[1]] for topic in shown_topics]
topic_corpus = dirichlet_model.__getitem__(bow=bow_corpus, eps=0) # cutoff probability to 0
topics_per_response = [response for response in topic_corpus]
flat_topic_coherences = [item for sublist in topics_per_response for item in sublist]
significant_topics = list(set([t_c[0] for t_c in flat_topic_coherences])) # those that appear
topic_averages = [sum([t_c[1] for t_c in flat_topic_coherences if t_c[0] == topic_num]) / len(bow_corpus) \
for topic_num in significant_topics]
topic_indexes_by_avg_coherence = [tup[0] for tup in sorted(enumerate(topic_averages), key=lambda i:i[1])[::-1]]
significant_topics_by_avg_coherence = [significant_topics[i] for i in topic_indexes_by_avg_coherence]
ordered_topics = [model_topics[i] for i in significant_topics_by_avg_coherence][:num_topics] # truncate if desired
return ordered_topics
A version of this function that includes an output of the averages coherences associated with the topics for keyword (tag) generation for a corpus can be found in this answer. A similar process for keywords for individual texts can further be found in this answer.