Effective clustering of a similarity matrix

后端 未结 3 2068
轻奢々
轻奢々 2021-02-09 19:19

my topic is similarity and clustering of (a bunch of) text(s). In a nutshell: I want to cluster collected texts together and they should appear in meaningful clusters at the end

相关标签:
3条回答
  • 2021-02-09 19:37

    Maybe you can transform your similarity matrix to a dissimilarity matrix such as transforming x to 1/x, then your problem is to cluster a dissimilarity matrix. I think the hierarchical cluster may work. These may help you:hierarchical clustering and Clustering a dissimilarity matrix

    0 讨论(0)
  • 2021-02-09 19:41

    Just try some. There are so many clustering algorithms out there, nobody will know all of them. Plus, it also depends a lot on your data set and the clustering structure that is there. In the end, there also may be just this one monster cluster with respect to cosine distance and BofW features.

    0 讨论(0)
  • 2021-02-09 19:47

    Since you're both new to the field, have an unknown number of clusters and are already using cosine distance I would recommend the FLAME clustering algorithm.

    It's intuitive, easy to implement, and has implementations in a large number of languages (not PHP though, largely because very few people use PHP for data science).

    Not to mention, it's actually good enough to be used in research by a large number of people. If nothing else you can get an idea of what exactly the shortcomings are in this clustering algorithm that you want to address in moving onto another one.

    0 讨论(0)
提交回复
热议问题