问题
I keep stubbling upon ELKI these couple of days while searching for the most suitable density clustering tool and decided to try it. For DBSCAN, I've managed to reproduce successfully the test which clusters the file "3clusters-and-noise-2d.csv" and have also managed to print clusters metadata and points in each cluster all via ELKI code from github (latest version) IN java (I'm not really interested in cli or ui tool).
Now, I want to use some kind of internal java structure to create a database instead of importing via a file to reduce write and read overhead.
In the example provided I'm able to do this but for only the first column of the file.
My question basically is, how to create the same database which was created via a file, when the same data already exists in java?
Got it!
so after some tweaking, basically what you do is use 2d array of doubles where each row represents a point and you have as much columns as your dimensions... to create your database without reading a file, you basically use an ArrayAdapterDatabaseConnection as follows:
double[][] data = new double[NUM_OF_POINTS][NUM_OF_DIMENSIONS];
//populate data according to your app
DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(data);
Database db = new StaticArrayDatabase(dbc, null);
db.initialize();
//dbscan algorithm setup
params = new ListParameterization();
params.addParameter(DBSCAN.Parameterizer.EPSILON_ID, 0.04);
params.addParameter(DBSCAN.Parameterizer.MINPTS_ID, 20);
DBSCAN<DoubleVector> dbscan = ClassGenericsUtil.parameterizeOrAbort(DBSCAN.class, params);
//run DBSCAN on database
Clustering<Model> result = dbscan.run(db);
I've tested this with the "3clusters-and-noise-2d.csv" dataset and can confirm i get same results when I pass them via file or arrayadapter.
回答1:
A complete example can be found in the ELKI sources:
http://elki.dbs.ifi.lmu.de/browser/elki/elki/src/main/java/tutorial/javaapi/PassingDataToELKI.java
It generates random data and runs k-means on it. It also shows how to reliably map back DBIDs
to your data points.
来源:https://stackoverflow.com/questions/31591883/how-to-use-existing-data-in-elki