cluster-analysis

Plot causes “Error: Incorrect Number of Dimensions”

只愿长相守 提交于 2021-02-01 05:17:21
问题 I am learning about the "kohonen" package in R for the purpose of making Self Organizing Maps (SOM, also called Kohonen Networks - a type of Machine Learning algorithm). I am following this R language tutorial over here: https://www.rpubs.com/loveb/som I tried to create my own data (this time with both "factor" and "numeric" variables) and run the SOM algorithm (this time using the "supersom()" function instead): #load libraries and adjust colors library(kohonen) #fitting SOMs library(ggplot2

OpenCV - How to apply Kmeans on a grayscale image?

a 夏天 提交于 2021-01-29 18:37:40
问题 I am trying to cluster a grayscale image using Kmeans. First, I have a question: Is Kmeans the best way to cluster a Mat or are there newer more efficient approaches? Second, when I try this: Mat degrees = imread("an image" , IMREAD_GRAYSCALE); const unsigned int singleLineSize = degrees.rows * degrees.cols; Mat data = degrees.reshape(1, singleLineSize); data.convertTo(data, CV_32F); std::vector<int> labels; cv::Mat1f colors; cv::kmeans(data, 3, labels, cv::TermCriteria(cv::TermCriteria::EPS

OpenCV - How to apply Kmeans on a grayscale image?

我怕爱的太早我们不能终老 提交于 2021-01-29 13:17:52
问题 I am trying to cluster a grayscale image using Kmeans. First, I have a question: Is Kmeans the best way to cluster a Mat or are there newer more efficient approaches? Second, when I try this: Mat degrees = imread("an image" , IMREAD_GRAYSCALE); const unsigned int singleLineSize = degrees.rows * degrees.cols; Mat data = degrees.reshape(1, singleLineSize); data.convertTo(data, CV_32F); std::vector<int> labels; cv::Mat1f colors; cv::kmeans(data, 3, labels, cv::TermCriteria(cv::TermCriteria::EPS

R: M3C library - Duplicate row.names error message

南楼画角 提交于 2021-01-29 09:39:14
问题 I am trying to run consensus clustering using M3C library in R. My dataset contains 451 samples and ~2500 genes. The row names are the ENTREZ IDs (numeric values) of the genes. I have crosschecked the dataset using "any(duplicated(colnames(MyData)))" command to make sure that there is no duplicate entries in the row names. I ran the following command to perform the consensus clustering using M3C library: res <- M3C(MyData, cores=8, seed = 123, des = annotation, removeplots = TRUE,

2 Dendrograms + Heatmap from condensed correlationmatrix with scipy

烈酒焚心 提交于 2021-01-29 05:23:03
问题 I try to create something like this: plotting results of hierarchical clustering ontop of a matrix of data in python Unfortunatelly when I try to execute the code, I get the following warnings: Warning (from warnings module): File "C:\Users\USER1\Desktop\test.py", line 15 Y = sch.linkage(D, method='centroid') ClusterWarning: scipy.cluster: The symmetric non-negative hollow observation matrix looks suspiciously like an uncondensed distance matrix Warning (from warnings module): File "C:\Users

How to calculate clustering coefficient of each node in the graph in Python using Networkx

有些话、适合烂在心里 提交于 2021-01-28 20:10:20
问题 I want to calculate the clustering coefficient of each node in the graph using python and Networkx functions. I know there might be a built-in function for this purpose but I want to calculate it by myself but my code is not working. Can someone please point out error? I have tried to test and debug the code. No. of neighbors of each node i.e. n_neighbors are calculated seem to be ok but next code is somehow not running or have some error in it which I'm unable to detect. import matplotlib as

KMeans clustering unbalanced data

只谈情不闲聊 提交于 2021-01-28 18:57:55
问题 I have a set of data with 50 features (c1, c2, c3 ...), with over 80k rows. Each row contains normalised numerical values (ranging 0-1). It is actually a normalised dummy variable, whereby some rows have only few features, 3-4 (i.e. 0 is assigned if there is no value). Most rows have about 10-20 features. I used KMeans to cluster the data, always resulting in a cluster with a large number of members. Upon analysis, I noticed that rows with fewer than 4 features tends to get clustered together

Finding the right package in R for cluster analysis

萝らか妹 提交于 2021-01-28 12:19:34
问题 I'm trying to find a package in R where I can find clusters that exceed a given threshold in a dataset. What I want to know is the the cluster duration/size and the individual values of each cluster. For example (a simple one): I have a vector of data, 10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15 The clusters that are larger than 9 are defined in bold, 10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15 So here the cluster sizes in order are, 1, 2, 1, 2, 4, 1 What I want R to do is

Delete outliers automatically of a calculated agglomerative hierarchical clustering data

不打扰是莪最后的温柔 提交于 2021-01-28 08:09:14
问题 in the cluster analysis the outliers of a dataset can be easily identified by the single-linkage method. Now I would like to remove the outliers automatically. My idea is to remove the data which exceed a specified distance value. Here is my code with the example data of mtcars: library(cluster) library(dendextend) cluster<-agnes(mtcars,stand=FALSE,method="single") dend = as.dendrogram(cluster) In the Plot you can see the resulting dendrogram. The last 4 cars ("Duster 360", "Camaro Z28",

Multivariate Gaussian distribution formula implementation

纵饮孤独 提交于 2021-01-28 06:22:16
问题 I have a certain problem while implementing multivariate Gaussian distribution for anomaly detection. I have referred the formula from Andrew Ng notes http://www.holehouse.org/mlclass/15_Anomaly_Detection.html below is the problem I face Suppose I have a data set with 2 features and m number of training set i.e n=2 and wants to determine my multivariate Gaussian probability p(x;mu;sigma) which should be a [m*1] matrix because it produces estimated Gaussian value by feature correlation. The