Affinity Propagation preferences initialization

后端未结

关注

 3  2423

I need to perform clustering without knowing in advance the number of clusters. The number of cluster may be from 1 to 5, since I may find cases where all the samples belong to

相关标签:

3条回答

误落风尘

2021-02-20 05:42

You can also merge clusters together by essentially running the algorithm a second time using the center samples or manually merging the most similar ones. So you could iteratively merge the closest clusters till you get your number, making the choice of preference easier since you can just choose anything that will result in a decent number of clusters (This worked decently well when I tried).

0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2021-02-20 05:46

No, there is no flaw. AP does not use distances, but requires you to specify a similarity. I don't know the scikit implementation so well, but according to what I read, it uses negative squared Euclidean distances by default to compute the similarity matrix. If you set the input preference to the minimal Euclidean distance, you get a positive value, while all similarities are negative. So this will typically result in as many clusters as you have samples (note: the higher the input preference, the more clusters). I'd rather suggest to set the input preference to the minimal negative squared distance, i.e. -1 times the square of the largest distance in the data set. This will give you a much smaller number of clusters, but not necessarily one single cluster. I don't know whether the preferenceRange() function exists also in the scikit implementation. There is Matlab code on the AP homepage and it is also implemented in the R package 'apcluster' that I am maintaining. This function allows for determining meaningful bounds for the input preference parameter. I hope that helps.

0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2021-02-20 05:48

You can control it by specifying the minimum preferences, but it's not sure that you will found a single cluster.

And also, I would suggest you to don't wanna make a single cluster because it would generate errors, as some data must not be the same or have similarity with examplers but as you provide the minimum preferences so the AP will commit the error.

0 讨论(0)
发布评论:

提交评论
- 加载中...