k-means

Get ordered kmeans cluster labels

≡放荡痞女 提交于 2020-04-30 07:32:05
问题 Say I have a data set x and do the following kmeans cluster: fit <- kmeans(x,2) My question is in regards to the output of fit$cluster: I know that it will give me a vector of integers (from 1:k) indicating the cluster to which each point is allocated. Instead, is there a way to have the clusters be labeled 1,2, etc... in order of decreasing numerical value of their center? For example: If x=c(1.5,1.4,1.45,.2,.3,.3) , then fit$cluster should result in (1,1,1,2,2,2) but not result in (2,2,2,1

How to choose and plot the quality criterion in `kml` function?

只愿长相守 提交于 2020-04-30 06:38:46
问题 I just started working with the kml package to perform longitudinal k-means clustering in R . By default the kml function uses the Calinski Harabatz Sorted criterion to choose the 'best' clustering. So by accessing the 'best' clustering you will always see the Calinski Harabatz Sorted criterion. How can we choose another quality criterion ? A minimal example: library(kml) # some data cld <- generateArtificialLongData(25) # perform clustering kml(cold) # choose the 'best' clustering: choice

How to choose and plot the quality criterion in `kml` function?

断了今生、忘了曾经 提交于 2020-04-30 06:38:45
问题 I just started working with the kml package to perform longitudinal k-means clustering in R . By default the kml function uses the Calinski Harabatz Sorted criterion to choose the 'best' clustering. So by accessing the 'best' clustering you will always see the Calinski Harabatz Sorted criterion. How can we choose another quality criterion ? A minimal example: library(kml) # some data cld <- generateArtificialLongData(25) # perform clustering kml(cold) # choose the 'best' clustering: choice

K-Means: Lloyd,Forgy,MacQueen,Hartigan-Wong

a 夏天 提交于 2020-04-29 05:42:59
问题 I'm working with the K-Means Algorithm in R and I want to figure out the differences of the 4 Algorithms Lloyd,Forgy,MacQueen and Hartigan-Wong which are available for the function "kmeans" in the stats package. However I was notable to get a sufficient answer to this question. I only found some rarely information: (Visit http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/K-Means) From this description Lloyd, Forgy, and Hartigan-Wong seem the same to me. Minimizing the within

K-Means: Lloyd,Forgy,MacQueen,Hartigan-Wong

自作多情 提交于 2020-04-29 05:41:20
问题 I'm working with the K-Means Algorithm in R and I want to figure out the differences of the 4 Algorithms Lloyd,Forgy,MacQueen and Hartigan-Wong which are available for the function "kmeans" in the stats package. However I was notable to get a sufficient answer to this question. I only found some rarely information: (Visit http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/K-Means) From this description Lloyd, Forgy, and Hartigan-Wong seem the same to me. Minimizing the within

K-Means: Lloyd,Forgy,MacQueen,Hartigan-Wong

半腔热情 提交于 2020-04-29 05:41:00
问题 I'm working with the K-Means Algorithm in R and I want to figure out the differences of the 4 Algorithms Lloyd,Forgy,MacQueen and Hartigan-Wong which are available for the function "kmeans" in the stats package. However I was notable to get a sufficient answer to this question. I only found some rarely information: (Visit http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/K-Means) From this description Lloyd, Forgy, and Hartigan-Wong seem the same to me. Minimizing the within

kmeans: Quick-TRANSfer stage steps exceeded maximum

旧巷老猫 提交于 2020-04-07 14:29:29
问题 I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20) . I get the following error: Quick-TRANSfer stage steps exceeded maximum (= 31834400) , and although one can view the code at http://svn.r-project.org/R/trunk/src/library/stats/R/kmeans.R - I am unsure as to what is going wrong. I assume my problem has to do with the size of my dataset, but I would be grateful if someone

kmeans: Quick-TRANSfer stage steps exceeded maximum

懵懂的女人 提交于 2020-04-07 14:29:25
问题 I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20) . I get the following error: Quick-TRANSfer stage steps exceeded maximum (= 31834400) , and although one can view the code at http://svn.r-project.org/R/trunk/src/library/stats/R/kmeans.R - I am unsure as to what is going wrong. I assume my problem has to do with the size of my dataset, but I would be grateful if someone

[笔记] 使用numpy手写k-means算法

时光怂恿深爱的人放手 提交于 2020-03-30 02:49:08
代码包括数据生成、可视化。 注意:下面代码仅供参考,实际使用还需加上一些约束,如迭代次数需要有个最大值,等等。 import numpy as np from matplotlib import pyplot as plt # - generate random data def generate_data(n_point_per_cate, center_point_list): """ n_point_per_cate: point number per category center_point_list: center point list """ points_list = [] for point in center_point_list: points_list.append(np.random.randn(n_point_per_cate, 2) + np.array(point)) return np.concatenate(points_list, axis=0) # - generate random data data = generate_data(100, [[3,4], [10,-4], [-5,0]]) data.shape (300, 2) # - visulize data plt.scatter(data[:,0], data[:,1]) #

第十七节 K-means

喜你入骨 提交于 2020-03-27 16:12:50
sklearn PAI:from sklearn.cluster import KMeans 聚类的原理 评价指标:轮廓系数,一般[-1,1]之间,一般超过0-0.1聚类效果已经十分不错 from sklearn.cluster import KMeans # K-means PAI import pandas as pd from sklearn.decomposition import PCA import matplotlib.pyplot as plt from sklearn.metrics import silhouette_score # 轮廓系数API # 数据地址:https://www.kaggle.com/c/instacart-market-basket-analysis/data # 读取表 prior = pd.read_csv(r"E:\360Downloads\Software\降维案列数据\order_products__prior.csv") products = pd.read_csv(r"E:\360Downloads\Software\降维案列数据\products.csv") order = pd.read_csv(r"E:\360Downloads\Software\降维案列数据\order.csv") aisles = pd.read