Plot dendrogram using sklearn.AgglomerativeClustering

后端 未结 5 1151
刺人心
刺人心 2021-01-31 15:17

I\'m trying to build a dendrogram using the children_ attribute provided by AgglomerativeClustering, but so far I\'m out of luck. I can\'t use sc

5条回答
  •  后悔当初
    2021-01-31 15:36

    For those willing to step out of Python and use the robust D3 library, it's not super difficult to use the d3.cluster() (or, I guess, d3.tree()) APIs to achieve a nice, customizable result.

    See the jsfiddle for a demo.

    The children_ array luckily functions easily as a JS array, and the only intermediary step is to use d3.stratify() to turn it into a hierarchical representation. Specifically, we need each node to have an id and a parentId:

    var N = 272;  // Your n_samples/corpus size.
    var root = d3.stratify()
      .id((d,i) => i + N)
      .parentId((d, i) => {
        var parIndex = data.findIndex(e => e.includes(i + N));
        if (parIndex < 0) {
          return; // The root should have an undefined parentId.
        }
        return parIndex + N;
      })(data); // Your children_
    

    You end up with at least O(n^2) behaviour here due to the findIndex line, but it probably doesn't matter until your n_samples becomes huge, in which case, you could precompute a more efficient index.

    Beyond that, it's pretty much plug and chug use of d3.cluster(). See mbostock's canonical block or my JSFiddle.

    N.B. For my use case, it sufficed merely to show non-leaf nodes; it's a bit trickier to visualise the samples/leaves, since these might not all be in the children_ array explicitly.

提交回复
热议问题