How to index with ELKI - OPTICS clustering

跟風遠走 提交于 2019-12-25 14:24:10

问题


I'm an ELKI beginner, and I've been using it to cluster around 10K lat-lon points from a .csv file. Once I get my settings correct, I'd like to scale up to 1MM points.

I'm using the OPTICSXi algorithm with LngLatDistanceFunction

I keep reading about "enabling R*-tree index with STR bulk loading" in order to see vast improvements in performance. The tutorials haven't helped me much.

Any tips on how I can implement this feature?


回答1:


The suggested parameters for using a spatial R* index on 2 dimensional data are:

-db.index tree.spatial.rstarvariants.rstar.RStarTreeFactory
-pagefile.pagesize 512
-spatial.bulkstrategy SortTileRecursiveBulkSplit

For higher dimensional data, larger page sizes are necessary. A page size of 512-1024 bytes seems to be the sweet spot for 2 dimensional data, but it does depend on your data, too.

To discretize clusters, you can use the Xi extraction:

-algorithm clustering.optics.OPTICSXi -opticsxi.xi 0.005

To benefit from index acceleration with OPTICS, choose epsilon as small as possible for your application. The parameter is in meters with all the earth models in ELKI.

-opticsxi.algorithm OPTICSHeap
-algorithm.distancefunction geo.LatLngDistanceFunction
-optics.epsilon 2000.0 -optics.minpts 10

uses 2 km distances maximum.

Make sure to distinguish latitude,longitude and longitude,latitude. Both orders are used, and you need to use the right distance function:

geo.LatLngDistanceFunction
geo.LngLatDistanceFunction


来源:https://stackoverflow.com/questions/32741510/how-to-index-with-elki-optics-clustering

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!