Scikit K-means clustering performance measure

前端 未结 3 682
自闭症患者
自闭症患者 2021-02-04 08:42

I\'m trying to do a clustering with K-means method but I would like to measure the performance of my clustering. I\'m not an expert but I am eager to learn more about clustering

3条回答
  •  一个人的身影
    2021-02-04 08:45

    Normally, clustering is considered as an Unsupervised method, thus is difficult to establish a good performance metric (as also suggested in the previous comments).

    Nevertheless, much useful information can be extrapolated from these algorithms (e.g. k-means). The problem is how to assign a semantics to each cluster, and thus measure the "performance" of your algorithm. In many cases, a good way to proceed is through a visualization of your clusters. Obviously, if your data have high dimensional features, as in many cases happen, the visualization is not that easy. Let me suggest two way to go, using k-means and another clustering algorithm.

    • K-mean: in this case, you can reduce the dimensionality of your data by using for example PCA. Using such algorithm, you can plot the data in a 2D plot and then visualize your clusters. However, what you see in this plot is a projection in a 2D space of your data, so can be not very accurate, but still can give you an idea of how your clusters are distributed.

    • Self-organizing map this is a clustering algorithm based on Neural Networks which create a discretized representation of the input space of the training samples, called a map, and is, therefore, a method to do dimensionality reduction (SOM). You can find a very nice python package called somoclu which has got this algorithm implemented and an easy way to visualize the result. This algorithm is very good for clustering also because does not require a priori selection of the number of cluster (in k-mean you need to choose k, here no).

提交回复
热议问题