dendrogram

scipy.cluster.hierarchy: labels seems not in the right order, and confused by the value of the vertical axes

孤街浪徒 提交于 2019-12-02 18:25:19
问题 I know that scipy.cluster.hierarchy focused on dealing with the distance matrix. But now I have a similarity matrix... After I plot it by using Dendrogram, something weird just happens. Here is the code: similarityMatrix = np.array(([1,0.75,0.75,0,0,0,0], [0.75,1,1,0.25,0,0,0], [0.75,1,1,0.25,0,0,0], [0,0.25,0.25,1,0.25,0.25,0], [0,0,0,0.25,1,1,0.75], [0,0,0,0.25,1,1,0.75], [0,0,0,0,0.75,0.75,1])) here is the linkage method Z_sim = sch.linkage(similarityMatrix) plt.figure(1) plt.title(

Plot dendrogram using sklearn.AgglomerativeClustering

此生再无相见时 提交于 2019-12-02 17:26:19
I'm trying to build a dendrogram using the children_ attribute provided by AgglomerativeClustering , but so far I'm out of luck. I can't use scipy.cluster since agglomerative clustering provided in scipy lacks some options that are important to me (such as the option to specify the amount of clusters). I would be really grateful for a any advice out there. import sklearn.cluster clstr = cluster.AgglomerativeClustering(n_clusters=2) clusterer.children_ Here is a simple function for taking a hierarchical clustering model from sklearn and plotting it using the scipy dendrogram function. Seems

Extracting clusters from seaborn clustermap

二次信任 提交于 2019-12-02 15:56:36
I am using the seaborn clustermap to create clusters and visually it works great (this example produces very similar results). However I am having trouble figuring out how to programmatically extract the clusters. For instance, in the example link, how could I find out that 1-1 rh, 1-1 lh, 5-1 rh, 5-1 lh make a good cluster? Visually it's easy. I am trying to use methods of looking through the data, and dendrograms but I'm having little success EDIT Code from example: import pandas as pd import seaborn as sns sns.set(font="monospace") df = sns.load_dataset("brain_networks", header=[0, 1, 2],

How to plot dendrograms with large datasets?

♀尐吖头ヾ 提交于 2019-12-02 15:52:24
I am using ape (Analysis of Phylogenetics and Evolution) package in R that has dendrogram drawing functionality. I use following commands to read the data in Newick format, and draw a dendrogram using the plot function: library("ape") gcPhylo <-read.tree(file = "gc.tree") plot(gcPhylo, show.node.label = TRUE) As the data set is quite large, it is impossible to see any details in the lower levels of the tree. I can see just black areas but no details. I can only see few levels from the top, and then no detail. I was wondering if there is any zoom capability of the plot function. I tried to

Using R to cluster based on euclidean distance and a complete linkage metric, too many vectors?

江枫思渺然 提交于 2019-12-02 11:43:31
问题 I am trying to figure out how to read in a counts matrix into R, and then cluster based on euclidean distance and a complete linkage metric. The original matrix has 56,000 rows (genes) and 7 columns (treatments). I want to see if there is a clustering relationship between the treatments. However, every time I try to do this, I first get an error stating, Error: cannot allocate vector of size 544.4 Gb Since I'm trying to reproduce work that has been published by someone else, I am wondering if

scipy.cluster.hierarchy: labels seems not in the right order, and confused by the value of the vertical axes

本秂侑毒 提交于 2019-12-02 11:07:55
I know that scipy.cluster.hierarchy focused on dealing with the distance matrix. But now I have a similarity matrix... After I plot it by using Dendrogram, something weird just happens. Here is the code: similarityMatrix = np.array(([1,0.75,0.75,0,0,0,0], [0.75,1,1,0.25,0,0,0], [0.75,1,1,0.25,0,0,0], [0,0.25,0.25,1,0.25,0.25,0], [0,0,0,0.25,1,1,0.75], [0,0,0,0.25,1,1,0.75], [0,0,0,0,0.75,0.75,1])) here is the linkage method Z_sim = sch.linkage(similarityMatrix) plt.figure(1) plt.title('similarity') sch.dendrogram( Z_sim, labels=['1','2','3','4','5','6','7'] ) plt.show() But here is the outcome

Using R to cluster based on euclidean distance and a complete linkage metric, too many vectors?

这一生的挚爱 提交于 2019-12-02 04:02:32
I am trying to figure out how to read in a counts matrix into R, and then cluster based on euclidean distance and a complete linkage metric. The original matrix has 56,000 rows (genes) and 7 columns (treatments). I want to see if there is a clustering relationship between the treatments. However, every time I try to do this, I first get an error stating, Error: cannot allocate vector of size 544.4 Gb Since I'm trying to reproduce work that has been published by someone else, I am wondering if I am making a mistake with my initial data entry. Second, if I try such clustering with just 20 genes

How to adjust sizes of x-axis in dendrogram (R)?

那年仲夏 提交于 2019-12-01 06:59:51
I would like to adjust the x-axis in a dendrogram where all the labels are seen, for large data sets. As example, I use iris data here: > iris.data=subset(iris,select=-Species) > d <- dist(iris.data, method="euclidean") > hc <- hclust(d, "ward") > plot(hc, hang=-1, main="Dendrogram of Ward's Method", label=iris$Species) After the plot function is used, the dendrogram will be like this: So, how I'm going to adjust the x-axis so then all the species are all clear seen. Like @Roman Luštrik said : You can do like this : png("plotdendogram.png",width=1600,height=800) par(cex=1,font=3) plot(hc, hang

How to color a dendrogram's labels according to defined groups? (in R)

ぃ、小莉子 提交于 2019-12-01 06:29:39
I have a numeric matrix in R with 24 rows and 10,000 columns. The row names of this matrix are basically file names from which I have read the data corresponding to each of the 24 rows. Apart from this I have a separate factor list with 24 entires, specifying the group to which the 24 files belong. There are 3 groups - Alcohols, Hydrocarbon and Ester. The names and the corresponding group to which they belong look like this: > MS.mz [1] "int-354.19" "int-361.35" "int-368.35" "int-396.38" "int-408.41" "int-410.43" "int-422.43" [8] "int-424.42" "int-436.44" "int-438.46" "int-452.00" "int-480.48"

How to adjust sizes of x-axis in dendrogram (R)?

∥☆過路亽.° 提交于 2019-12-01 05:02:44
问题 I would like to adjust the x-axis in a dendrogram where all the labels are seen, for large data sets. As example, I use iris data here: > iris.data=subset(iris,select=-Species) > d <- dist(iris.data, method="euclidean") > hc <- hclust(d, "ward") > plot(hc, hang=-1, main="Dendrogram of Ward's Method", label=iris$Species) After the plot function is used, the dendrogram will be like this: So, how I'm going to adjust the x-axis so then all the species are all clear seen. 回答1: Like @Roman Luštrik