I\'m trying to build a dendrogram using the children_
attribute provided by AgglomerativeClustering
, but so far I\'m out of luck. I can\'t use sc
For those willing to step out of Python and use the robust D3 library, it's not super difficult to use the d3.cluster()
(or, I guess, d3.tree()
) APIs to achieve a nice, customizable result.
See the jsfiddle for a demo.
The children_
array luckily functions easily as a JS array, and the only intermediary step is to use d3.stratify()
to turn it into a hierarchical representation. Specifically, we need each node to have an id
and a parentId
:
var N = 272; // Your n_samples/corpus size.
var root = d3.stratify()
.id((d,i) => i + N)
.parentId((d, i) => {
var parIndex = data.findIndex(e => e.includes(i + N));
if (parIndex < 0) {
return; // The root should have an undefined parentId.
}
return parIndex + N;
})(data); // Your children_
You end up with at least O(n^2) behaviour here due to the findIndex
line, but it probably doesn't matter until your n_samples becomes huge, in which case, you could precompute a more efficient index.
Beyond that, it's pretty much plug and chug use of d3.cluster()
. See mbostock's canonical block or my JSFiddle.
N.B. For my use case, it sufficed merely to show non-leaf nodes; it's a bit trickier to visualise the samples/leaves, since these might not all be in the children_
array explicitly.