kmeans: Quick-TRANSfer stage steps exceeded maximum

前端 未结 4 360
暗喜
暗喜 2021-02-01 15:04

I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25

相关标签:
4条回答
  • 2021-02-01 15:28

    Had the same problem, seems to have something to do with available memory.

    Running Garbage Collection before the function worked for me:

    gc()
    

    or reference:

    Increasing (or decreasing) the memory available to R processes

    0 讨论(0)
  • 2021-02-01 15:28

    I got the same error message, but in my case it helped to increase the number of iterations iter.max. That contradicts the theory of memory overload.

    0 讨论(0)
  • 2021-02-01 15:29

    @jlhoward's comment:

    Try

    kmeans(dataset, algorithm="Lloyd", ..)
    
    0 讨论(0)
  • 2021-02-01 15:35

    I just had the same issue.

    See the documentation of kmeans in R via ?kmeans:

    The Hartigan-Wong algorithm generally does a better job than either of those, but trying several random starts (‘nstart’> 1) is often recommended. In rare cases, when some of the points (rows of ‘x’) are extremely close, the algorithm may not converge in the “Quick-Transfer” stage, signalling a warning (and returning ‘ifault = 4’). Slight rounding of the data may be advisable in that case.

    In these cases, you may need to switch to the Lloyd or MacQueen algorithms.

    The nasty thing about R here is that it continues with a warning that may go unnoticed. For my benchmark purposes, I consider this to be a failed run, and thus I use:

    if (kms$ifault==4) { stop("Failed in Quick-Transfer"); }
    

    Depending on your use case, you may want to do something like

    if (kms$ifault==4) { kms = kmeans(X, kms$centers, algorithm="MacQueen"); }
    

    instead, to continue with a different algorithm.

    If you are benchmarking K-means, note that R uses iter.max=10 per default. It may take much more than 10 iterations to converge.

    0 讨论(0)
提交回复
热议问题