Extracting clusters from seaborn clustermap

前端 未结 2 918
失恋的感觉
失恋的感觉 2020-12-23 14:45

I am using the seaborn clustermap to create clusters and visually it works great (this example produces very similar results).

However I am having troub

相关标签:
2条回答
  • 2020-12-23 15:05

    You probably want a new column in your dataframe with the cluster membership. I've managed to do this from assembled snippets of code stolen from all over the web:

    import seaborn
    import scipy
    
    g = seaborn.clustermap(df,method='average')
    den = scipy.cluster.hierarchy.dendrogram(g.dendrogram_col.linkage,
                                             labels = df.index,
                                             color_threshold=0.60)  
    from collections import defaultdict
    
    def get_cluster_classes(den, label='ivl'):
        cluster_idxs = defaultdict(list)
        for c, pi in zip(den['color_list'], den['icoord']):
            for leg in pi[1:3]:
                i = (leg - 5.0) / 10.0
                if abs(i - int(i)) < 1e-5:
                    cluster_idxs[c].append(int(i))
    
        cluster_classes = {}
        for c, l in cluster_idxs.items():
            i_l = [den[label][i] for i in l]
            cluster_classes[c] = i_l
    
        return cluster_classes
    
    clusters = get_cluster_classes(den)
    
    cluster = []
    for i in df.index:
        included=False
        for j in clusters.keys():
            if i in clusters[j]:
                cluster.append(j)
                included=True
        if not included:
            cluster.append(None)
    
    df["cluster"] = cluster
    

    So this gives you a column with 'g' or 'r' for the green- or red-labeled clusters. I determine my color_threshold by plotting the dendrogram, and eyeballing the y-axis values.

    0 讨论(0)
  • 2020-12-23 15:17

    While using result.linkage.dendrogram_col or result.linkage.dendrogram_row will currently work, it seems to be an implementation detail. The safest route is to first compute the linkages explicitly and pass them to the clustermap function, which has row_linkage and col_linkage parameters just for that.

    Replacing the last line in your example (result = ...) with the following code gives the same result as before, but you will also have row_linkage and col_linkage variables that you can use with fcluster etc.

    from scipy.spatial import distance
    from scipy.cluster import hierarchy
    
    correlations = df.corr()
    correlations_array = np.asarray(df.corr())
    
    row_linkage = hierarchy.linkage(
        distance.pdist(correlations_array), method='average')
    
    col_linkage = hierarchy.linkage(
        distance.pdist(correlations_array.T), method='average')
    
    sns.clustermap(correlations, row_linkage=row_linkage, col_linkage=col_linkage, row_colors=network_colors, method="average",
                   col_colors=network_colors, figsize=(13, 13), cmap=cmap)
    

    In this particular example, the code could be simplified more since the correlations array is symmetric and therefore row_linkage and col_linkage will be identical.

    Note: A previous answer included a call to distance.squareshape according to what the code in seaborn does, but that is a bug.

    0 讨论(0)
提交回复
热议问题