elki

ELKI Kmeans clustering Task failed error for high dimensional data

只愿长相守 提交于 2019-12-04 05:51:43
问题 I have a 60000 documents which i processed in gensim and got a 60000*300 matrix. I exported this as a csv file. When i import this in ELKI environment and run Kmeans clustering, i am getting below error. Task failed de.lmu.ifi.dbs.elki.data.type.NoSupportedDataTypeException: No data type found satisfying: NumberVector,field AND NumberVector,variable Available types: DBID DoubleVector,variable,mindim=266,maxdim=300 LabelList at de.lmu.ifi.dbs.elki.database.AbstractDatabase.getRelation

Running clustering algorithms in ELKI

无人久伴 提交于 2019-12-03 21:02:40
I need to run a k-medoids clustering algorithm by using ELKI programmatically. I have a similarity matrix that I wish to input to the algorithm. Is there any code snippet available for how to run ELKI algorithms? I basically need to know how to create Database and Relation objects, create a custom distance function, and read the algorithm output. Unfortunately the ELKI tutorial ( http://elki.dbs.ifi.lmu.de/wiki/Tutorial ) focuses on the GUI version and on implementing new algorithms, and trying to write code by looking at the Javadoc is frustrating. If someone is aware of any easy-to-use

ELKI Kmeans clustering Task failed error for high dimensional data

岁酱吖の 提交于 2019-12-02 12:30:32
I have a 60000 documents which i processed in gensim and got a 60000*300 matrix. I exported this as a csv file. When i import this in ELKI environment and run Kmeans clustering, i am getting below error. Task failed de.lmu.ifi.dbs.elki.data.type.NoSupportedDataTypeException: No data type found satisfying: NumberVector,field AND NumberVector,variable Available types: DBID DoubleVector,variable,mindim=266,maxdim=300 LabelList at de.lmu.ifi.dbs.elki.database.AbstractDatabase.getRelation(AbstractDatabase.java:126) at de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm.run(AbstractAlgorithm.java:81) at

Using ELKI's Distance Function

折月煮酒 提交于 2019-12-02 06:35:56
问题 This is a follow up from a previous question, where we commented that using euclidian distances with lat,long coordinates does not yeld correct results. I read in the documentation that ELKI enables geographic data, namely int its distance function, present in the various clustering algorithms. In the user interface of ELKI, I can see there are options to replace the default distance function (euclidian) by a better suited one. I also see that in that case, you need to provide a datum, which

Using ELKI on custom objects and making sense of results

若如初见. 提交于 2019-12-02 06:28:07
I am trying to use ELKI's SLINK implementation of hierarchical clustering in my program. I have a set of objects (of my own type) that need to be clustered. For that, I convert them to feature vectors before clustering. This is how I currently got it to run and produce some result (code is in Scala): val clusterer = new SLINK(CosineDistanceFunction.STATIC, 3) val connection = new ArrayAdapterDatabaseConnection(featureVectors) val database = new StaticArrayDatabase(connection, null) database.initialize() val result = clusterer.run(database).asInstanceOf[Clustering[_ <: Model]] Now, the result

Using a Geo Distance Function on ELKI

我的梦境 提交于 2019-12-02 01:20:33
I am using ELKI to mine some geospatial data (lat,long pairs) and I am quite concerned on using the right data types and algorithms. On the parameterizer of my algorithm, I tried to change the default distance function by a geo function (LngLatDistanceFunction, as I am using x,y data) as bellow: params.addParameter (DISTANCE_FUNCTION_ID, geo.LngLatDistanceFunction.class); However the results are quite surprising: it creates clusters of a repeated point, such as the example bellow: (2.17199922, 41.38190043, NaN), (2.17199922, 41.38190043, NaN), (2.17199922, 41.38190043, NaN), (2.17199922, 41

Running DBSCAN in ELKI

笑着哭i 提交于 2019-11-27 07:21:34
问题 I am trying to cluster some geospatial data, and I previously tried the WEKA library. I found this benchmarking, and decided to try ELKI. Despite the advice to not use ELKI as a Java library (which is suppose to be less maintained than the UI), I incorporated it in my application, and I can say that I am quite happy about the results. The structures that it uses to store data, are far more efficient than the ones used by Weka, and the fact that it has the option of using a spatial index is