1D Number Array Clustering [duplicate]

≡放荡痞女 提交于 2019-11-26 15:48:01
Anony-Mousse

Don't use multidimensional clustering algorithms for a one-dimensional problem. A single dimension is much more special than you naively think, because you can actually sort it, which makes things a lot easier.

In fact, it is usually not even called clustering, but e.g. segmentation or natural breaks optimization.

You might want to look at Jenks Natural Breaks Optimization and similar statistical methods. Kernel Density Estimation is also a good method to look at, with a strong statistical background. Local minima in density are be good places to split the data into clusters, with statistical reasons to do so. KDE is maybe the most sound method for clustering 1-dimensional data.

With KDE, it again becomes obvious that 1-dimensional data is much more well behaved. In 1D, you have local minima; but in 2D you may have saddle points and such "maybe" splitting points. See this Wikipedia illustration of a saddle point, as how such a point may or may not be appropriate for splitting clusters.

You may look for discretize algorithms. 1D discretization problem is a lot similar to what you are asking. They decide cut-off points, according to frequency, binning strategy etc.

weka uses following algorithms in its , discretization process.

weka.filters.supervised.attribute.Discretize

uses either Fayyad & Irani's MDL method or Kononeko's MDL criterion

weka.filters.unsupervised.attribute.Discretize

uses simple binning

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!