How to get centroids from SciPy's hierarchical agglomerative clustering?

前端 未结 2 1321
我在风中等你
我在风中等你 2021-02-13 04:15

I am using SciPy\'s hierarchical agglomerative clustering methods to cluster a m x n matrix of features, but after the clustering is complete, I can\'t seem to figure out how to

相关标签:
2条回答
  • 2021-02-13 05:04

    A possible solution is a function, which returns a codebook with the centroids like kmeans in scipy.cluster.vq does. Only thing you need is the partition as vector with flat clusters part and the original observations X

    def to_codebook(X, part):
        """
        Calculates centroids according to flat cluster assignment
    
        Parameters
        ----------
        X : array, (n, d)
            The n original observations with d features
    
        part : array, (n)
            Partition vector. p[n]=c is the cluster assigned to observation n
    
        Returns
        -------
        codebook : array, (k, d)
            Returns a k x d codebook with k centroids
        """
        codebook = []
    
        for i in range(part.min(), part.max()+1):
            codebook.append(X[part == i].mean(0))
    
        return np.vstack(codebook)
    
    0 讨论(0)
  • 2021-02-13 05:10

    You can do something like this (D=number of dimensions):

    # Sum the vectors in each cluster
    lens = {}      # will contain the lengths for each cluster
    centroids = {} # will contain the centroids of each cluster
    for idx,clno in enumerate(T):
        centroids.setdefault(clno,np.zeros(D)) 
        centroids[clno] += features[idx,:]
        lens.setdefault(clno,0)
        lens[clno] += 1
    # Divide by number of observations in each cluster to get the centroid
    for clno in centroids:
        centroids[clno] /= float(lens[clno])
    

    This will give you a dictionary with cluster number as the key and the centroid of the specific cluster as the value.

    0 讨论(0)
提交回复
热议问题