Text clustering using Scipy Hierarchy Clustering in Python

前端 未结 1 1713
醉话见心
醉话见心 2021-01-03 16:26

I have a text corpus that contains 1000+ articles each in a separate line. I am trying to use Hierarchy Clustering using Scipy in python to produce clusters of related artic

相关标签:
1条回答
  • 2021-01-03 17:04

    You can do the following:

    1. Align your results (your clustering variable) with your input (the 1000+ articles).
    2. Using pandas library, you can use a groupby function with the cluster # as its key.
    3. Per group (using the get_group function), fill up a defaultdict of integers for every word you encounter.
    4. You can now sort the dictionary of word counts in descending order and get your desired number of most frequent words.

    Good luck with what you're doing and please do accept my answer if it's what you're looking for.

    0 讨论(0)
提交回复
热议问题