k-means

Sorting 2D array in new 2D array (K-menas clustering) Java

ε祈祈猫儿з 提交于 2020-07-22 06:09:41
问题 As input I have 2D array PointXY[ ][ ] clusters, which looks like this: [[23.237633,53.78671], [69.15293,17.138134], [23.558687,45.70517]] . . . [[47.851738,16.525734], [47.802097,16.689285], [47.946404,16.732542]] [[47.89601,16.638218], [47.833263,16.478987], [47.88203,16.45793]] [[47.75438,16.549816], [47.915512,16.506475], [47.768547,16.67624]] . . . So elements in array are of type PointXY[ ], defined like this public PointXY(float x, float y) { this.x = x; this.y = y; } what I would like

scikit-learn: Finding the features that contribute to each KMeans cluster

孤街浪徒 提交于 2020-07-04 06:25:12
问题 Say you have 10 features you are using to create 3 clusters. Is there a way to see the level of contribution each of the features have for each of the clusters? What I want to be able to say is that for cluster k1, features 1,4,6 were the primary features where as cluster k2's primary features were 2,5,7. This is the basic setup of what I am using: k_means = KMeans(init='k-means++', n_clusters=3, n_init=10) k_means.fit(data_features) k_means_labels = k_means.labels_ 回答1: You can use Principle

SKLearn KMeans Convergence Warning

南楼画角 提交于 2020-06-27 17:05:51
问题 I am using SKLearn's KMeans clustering on a 1D dataset. The bug I am getting is that when I run the code, I am getting a ConvergenceWarning : ConvergenceWarning: Number of distinct clusters (<some integer n>) found smaller than n_clusters (<some integer bigger than n>). Possibly due to duplicate points in X. return_n_iter=True) I cannot find anything on this aside from the source code, which does not indicate what exactly is going wrong. I believe my bug is either because I have a 1D data

Is K-means for clustering data with many zero values?

落花浮王杯 提交于 2020-05-15 18:38:49
问题 I need to cluster a matrix which contains mostly zeros values...Is K-means appropriate for these kind of data or do I need to consider a different algorithm? 回答1: No. The reason is that the mean is not sensible on sparse data. The resulting mean vectors will have very different characteristics than your actual data; they will often end up being more similar to each other than to actual documents! There are some modifications that improve k-means for sparse data such as spherical k-means. But

Is K-means for clustering data with many zero values?

白昼怎懂夜的黑 提交于 2020-05-15 18:38:10
问题 I need to cluster a matrix which contains mostly zeros values...Is K-means appropriate for these kind of data or do I need to consider a different algorithm? 回答1: No. The reason is that the mean is not sensible on sparse data. The resulting mean vectors will have very different characteristics than your actual data; they will often end up being more similar to each other than to actual documents! There are some modifications that improve k-means for sparse data such as spherical k-means. But

graphics window not working properly in `kml` package

不羁的心 提交于 2020-05-14 09:17:13
问题 I started working with the package kml to perform longitudinal cluster analysis. The package claims to have an interactive graphics window that lets you explore the clusterings found by kml . The window can be opened (according to the docs) by calling the function choice . But: That window does not open. Instead I get an error: Error in setGraphicsEventEnv(which, as.environment(list(...))) : this graphics device does not support event handling From the docs ?choice : At first, choice opens a

graphics window not working properly in `kml` package

故事扮演 提交于 2020-05-14 09:15:44
问题 I started working with the package kml to perform longitudinal cluster analysis. The package claims to have an interactive graphics window that lets you explore the clusterings found by kml . The window can be opened (according to the docs) by calling the function choice . But: That window does not open. Instead I get an error: Error in setGraphicsEventEnv(which, as.environment(list(...))) : this graphics device does not support event handling From the docs ?choice : At first, choice opens a

How can we show the trajectories belonging to clusters in `kml` package?

孤街醉人 提交于 2020-05-13 22:55:10
问题 The kml package implements k-means for longitudinal data. The clustering works just fine. Now I'm wondering how I can show the 'structure' of the clusters, for example, by coloring them. A most simple example from the docs (help file of the clusterLongData function..): library(kml) traj <- matrix(c(1,2,3,1,4, 3,6,1,8,10, 1,2,1,3,2, 4,2,5,6,3, 4,3,4,4,4, 7,6,5,5,4),6) myCld <- clusterLongData( traj=traj, idAll=as.character(c(100,102,103,109,115,123)), time=c(1,2,4,8,15), varNames="P", maxNA=3

Fast (< n^2) clustering algorithm

自闭症网瘾萝莉.ら 提交于 2020-05-09 17:47:25
问题 I have 1 million 5-dimensional points that I need to group into k clusters with k << 1 million. In each cluster, no two points should be too far apart (e.g. they could be bounding spheres with a specified radius). That means that there probably has to be many clusters of size 1. But! I need the running time to be well below n^2. n log n or so should be fine. The reason I'm doing this clustering is to avoid computing a distance matrix of all n points (which takes n^2 time or many hours),

Scikit Learn - K-Means - Elbow - criterion

Deadly 提交于 2020-05-09 17:43:05
问题 Today i'm trying to learn something about K-means. I Have understand the algorithm and i know how it works. Now i'm looking for the right k... I found the elbow criterion as a method to detect the right k but i do not understand how to use it with scikit learn?! In scikit learn i'm clustering things in this way kmeans = KMeans(init='k-means++', n_clusters=n_clusters, n_init=10) kmeans.fit(data) So should i do this several times for n_clusters = 1...n and watch at the Error rate to get the