Clustering algorithm with different epsilons on different axes

一曲冷凌霜 提交于 2019-12-11 03:57:01

问题


I am looking for a clustering algorithm such a s DBSCAN do deal with 3d data, in which is possible to set different epsilons depending on the axis. So for instance an epsilon of 10m on the x-y plan, and an epsilon 0.2m on the z axis.

Essentially, I am looking for large but flat clusters.

Note: I am an archaeologist, the algorithm will be used to look for potential correlations between objects scattered in large surfaces, but in narrow vertical layers


回答1:


Solution 1:

Scale your data set to match your desired epsilon.

In your case, scale z by 50.

Solution 2:

Use a weighted distance function.

E.g. WeightedEuclideanDistanceFunction in ELKI, and choose your weights accordingly, e.g. -distance.weights 1,1,50 will put 50x as much weight on the third axis.

This may be the most convenient option, since you are already using ELKI.




回答2:


Just define a custom distance metric when computing the DBSCAN core points. The standard DBSCAN uses the Euclidean distance to compute points within an epsilon. So all dimensions are treated the same.

However, you could use the Mahalanobis distance to weigh each dimension differently. You can use a diagonal covariance matrix for flat clusters. You can use a full symmetric covariance matrix for flat tilted clusters, etc.

In your case, you would use a covariance matrix like:

100  0    0   
  0  100  0   
  0    0  0.04

In the pseudo code provided at the Wikipedia entry for DBSCAN just use one of the distance metrics suggested above in the regionQuery function.

Update

Note: scaling the data is equivalent to using an appropriate metric.



来源:https://stackoverflow.com/questions/31073628/clustering-algorithm-with-different-epsilons-on-different-axes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!