Kmeans matlab “Empty cluster created at iteration 1” error

后端 未结 3 934
悲&欢浪女
悲&欢浪女 2021-02-08 18:14

I\'m using this script to cluster a set of 3D points using the kmeans matlab function but I always get this error \"Empty cluster created at iteration 1\". The script I\'m usin

3条回答
  •  爱一瞬间的悲伤
    2021-02-08 18:46

    Amro described the reason clearly:

    It is simply telling you that during the assign-recompute iterations, a cluster became empty (lost all assigned points). This is usually caused by an inadequate cluster initialization, or that the data has less inherent clusters than you specified.

    But the other option that could help to solve this problem is emptyaction:

    Action to take if a cluster loses all its member observations.

    error: Treat an empty cluster as an error (default).

    drop: Remove any clusters that become empty. kmeans sets the corresponding return values in C and D to NaN. (for information about C and D see kmeans documentioan page)

    singleton: Create a new cluster consisting of the one point furthest from its centroid.


    An example:

    Let’s run a simple code to see how this option changes the behavior and results of kmeans. This sample tries to partition 3 observations in 3 clusters while 2 of them are located at same point:

    clc;
    X = [1 2; 1 2; 2 3];
    [I, C] = kmeans(X, 3, 'emptyaction', 'singleton');
    [I, C] = kmeans(X, 3, 'emptyaction', 'drop');
    [I, C] = kmeans(X, 3, 'emptyaction', 'error')
    

    The first call with singleton option displays a warning and returns:

    I =                               C =
     3                                 2     3
     2                                 1     2
     1                                 1     2
    

    As you can see two cluster centroids are created at same location ([1 2]), and two first rows of X are assigned to these clusters.

    The Second call with drop option also displays same warning message, but returns different results:

    I =                               C =
     1                                 1     2
     1                               NaN   NaN
     3                                 2     3
    

    It just returns two cluster centers and assigns two first rows of X to same cluster. I think most of the times this option would be most useful. In cases that observations are too close and we need as more cluster centers as possible, we can let MATLAB decide about the number. You can remove NaN rows form C like this:

    C(any(isnan(C), 2), :) = [];
    

    And finally the third call generates an exception and halts the program as expected.

    Empty cluster created at iteration 1.

提交回复
热议问题