Plot dendrogram using sklearn.AgglomerativeClustering

后端 未结 5 1133
刺人心
刺人心 2021-01-31 15:17

I\'m trying to build a dendrogram using the children_ attribute provided by AgglomerativeClustering, but so far I\'m out of luck. I can\'t use sc

5条回答
  •  有刺的猬
    2021-01-31 15:21

    I came across the exact same problem some time ago. The way I managed to plot the damn dendogram was using the software package ete3. This package is able to flexibly plot trees with various options. The only difficulty was to convert sklearn's children_ output to the Newick Tree format that can be read and understood by ete3. Furthermore, I need to manually compute the dendrite's span because that information was not provided with the children_. Here is a snippet of the code I used. It computes the Newick tree and then shows the ete3 Tree datastructure. For more details on how to plot, take a look here

    import numpy as np
    from sklearn.cluster import AgglomerativeClustering
    import ete3
    
    def build_Newick_tree(children,n_leaves,X,leaf_labels,spanner):
        """
        build_Newick_tree(children,n_leaves,X,leaf_labels,spanner)
    
        Get a string representation (Newick tree) from the sklearn
        AgglomerativeClustering.fit output.
    
        Input:
            children: AgglomerativeClustering.children_
            n_leaves: AgglomerativeClustering.n_leaves_
            X: parameters supplied to AgglomerativeClustering.fit
            leaf_labels: The label of each parameter array in X
            spanner: Callable that computes the dendrite's span
    
        Output:
            ntree: A str with the Newick tree representation
    
        """
        return go_down_tree(children,n_leaves,X,leaf_labels,len(children)+n_leaves-1,spanner)[0]+';'
    
    def go_down_tree(children,n_leaves,X,leaf_labels,nodename,spanner):
        """
        go_down_tree(children,n_leaves,X,leaf_labels,nodename,spanner)
    
        Iterative function that traverses the subtree that descends from
        nodename and returns the Newick representation of the subtree.
    
        Input:
            children: AgglomerativeClustering.children_
            n_leaves: AgglomerativeClustering.n_leaves_
            X: parameters supplied to AgglomerativeClustering.fit
            leaf_labels: The label of each parameter array in X
            nodename: An int that is the intermediate node name whos
                children are located in children[nodename-n_leaves].
            spanner: Callable that computes the dendrite's span
    
        Output:
            ntree: A str with the Newick tree representation
    
        """
        nodeindex = nodename-n_leaves
        if nodename

提交回复
热议问题