ELKI: Running DBSCAN on custom Objects in Java

瘦欲@ 提交于 2019-12-23 10:27:04

问题


I'm trying to use ELKI from within JAVA to run DBSCAN. For testing I used a FileBasedDatabaseConnection. Now I would like to run DBSCAN with my custom Objects as parameters.

My objects have the following structure:

public class MyObject {
  private Long id;
  private Float param1;
  private Float param2;
  // ... and more parameters as well as getters and setters
}

I'd like to run DBSCAN within ELKI using a List<MyObject> as database, but only some of the parameters should be taken into account (e.g. running DBSCAN on the objects using the parameters param1, param2 and param4). Ideally the resulting clusters contain the whole objects.

Is there any way to achieve this behaviour?

If not, how can I convert the objects into a format that ELKI understands and allows me to match the resulting cluster-objects with my custom objects (i.e. is there an easy way to programmatically set a label)?

The following question speaks of featureVectors: Using ELKI on custom objects and making sense of results
May this be a possible solution for my problem? And how is a feature vector created out of my List<MyObject>?


回答1:


ELKI has a modular architecture.

If you want your own data source, look at the datasource package, and implement the DatabaseConnection (JavaDoc) interface.

If you want to process MyObject objects (the class you shared above will likely come at a substantial performance impact), that is not particularly hard. You need a SimpleTypeInformation<MyObject> (JavaDoc) to identify your data type, and implement a PrimitiveDistanceFunction (JavaDoc) for your data type.

If your actual data are floats, I suggest to use DoubleVector or FloatVector instead, and just use e.g. SubspaceEuclideanDistanceFunction to handle only those attributes you want to use.

For these data types and many distance functions, R*-tree indexes can be used substantially speed up DBSCAN execution time.

A Cluster (JavaDoc) in ELKI never stores the point data. It only stores point DBIDs (Wiki). You can get the point data from the Database relation, or use e.g. offsets (Wiki) to map them back to a list position for static databases.



来源:https://stackoverflow.com/questions/30893319/elki-running-dbscan-on-custom-objects-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!