Custom cluster colors of SciPy dendrogram in Python (link_color_func?)

后端 未结 3 1275
感情败类
感情败类 2020-12-28 08:42

I want to color my clusters with a color map that I made in the form of a dictionary (i.e. {leaf: color}).

I\'ve tried following https://joernhees.de/

相关标签:
3条回答
  • 2020-12-28 09:01

    Here a solution that uses the return matrix Z of linkage() (described early but a little hidden in the docs) and link_color_func:

    # see question for code prior to "color mapping"
    
    # Color mapping
    dflt_col = "#808080"   # Unclustered gray
    D_leaf_colors = {"attr_1": dflt_col,
    
                     "attr_4": "#B061FF", # Cluster 1 indigo
                     "attr_5": "#B061FF",
                     "attr_2": "#B061FF",
                     "attr_8": "#B061FF",
                     "attr_6": "#B061FF",
                     "attr_7": "#B061FF",
    
                     "attr_0": "#61ffff", # Cluster 2 cyan
                     "attr_3": "#61ffff",
                     "attr_9": "#61ffff",
                     }
    
    # notes:
    # * rows in Z correspond to "inverted U" links that connect clusters
    # * rows are ordered by increasing distance
    # * if the colors of the connected clusters match, use that color for link
    link_cols = {}
    for i, i12 in enumerate(Z[:,:2].astype(int)):
      c1, c2 = (link_cols[x] if x > len(Z) else D_leaf_colors["attr_%d"%x]
        for x in i12)
      link_cols[i+1+len(Z)] = c1 if c1 == c2 else dflt_col
    
    # Dendrogram
    D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=None,
      leaf_font_size=12, leaf_rotation=45, link_color_func=lambda x: link_cols[x])
    

    Here the output:

    0 讨论(0)
  • 2020-12-28 09:06

    I found a hackish solution, and does require to use the color threshold (but I need to use it in order to obtain the same original coloring, otherwise the colors are not the same as presented in the OP), but could lead you to a solution. However, you may not have enough information to know how to set the color palette order.

    # Init
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns; sns.set()
    
    # Load data
    from sklearn.datasets import load_diabetes
    
    # Clustering
    from scipy.cluster.hierarchy import dendrogram, fcluster, leaves_list, set_link_color_palette
    from scipy.spatial import distance
    from fastcluster import linkage # You can use SciPy one too
    
    %matplotlib inline
    # Dataset
    A_data = load_diabetes().data
    DF_diabetes = pd.DataFrame(A_data, columns = ["attr_%d" % j for j in range(A_data.shape[1])])
    
    # Absolute value of correlation matrix, then subtract from 1 for disimilarity
    DF_dism = 1 - np.abs(DF_diabetes.corr())
    
    # Compute average linkage
    A_dist = distance.squareform(DF_dism.as_matrix())
    Z = linkage(A_dist,method="average")
    
    # Color mapping dict not relevant in this case
    # Dendrogram
    # To get this dendrogram coloring below  `color_threshold=0.7`
    #Change the color palette, I did not include the grey, which is used above the threshold
    set_link_color_palette(["#B061FF", "#61ffff"])
    D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=.7, leaf_font_size=12, leaf_rotation=45, 
                   above_threshold_color="grey")
    

    The result:

    0 讨论(0)
  • 2020-12-28 09:08

    Two-liner for applying custom colormap to cluster branches:

    import matplotlib as mpl
    from matplotlib.pyplot import cm
    from scipy.cluster import hierarchy
    
    cmap = cm.rainbow(np.linspace(0, 1, 10))
    hierarchy.set_link_color_palette([mpl.colors.rgb2hex(rgb[:3]) for rgb in cmap])
    

    You can then replace rainbow by any cmap and change 10 for the number of cluster you want.

    0 讨论(0)
提交回复
热议问题