How to assign an new observation to existing Kmeans clusters based on nearest cluster centriod logic in python?

后端未结

关注

 3  838

I used the below code to create k-means clusters using Scikit learn.

kmean = KMeans(n_clusters=nclusters,n_jobs=-1,random_state=2376,max_iter=1000,n_init=100


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  栀梦        
                
              
                            
                2021-01-15 03:02
              
            
            
                                                                       
According to the Sklearn Kmeans documentation using predict(X, sample_weight=None) after loading the pickle file with the stored Kmeans model, will predict the closest cluster each sample in X belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
Practical note!

Many times people tend to take the clusters labels from model.labels_, however in this case of of prediction make sure to use the returned result, such as pred_y in the as in the following example:
    from sklearn.cluster import KMeans
    import pickle

    # load the model
    model = pickle.load(open(filename, 'rb'))

    # predict using the loaded model
    pred_y = model.predict(X)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  失恋的感觉        
                
              
                            
                2021-01-15 03:03
              
            
            
                                                                       
This question is a bit old, but kmeans sets a cluster_centers_ parameter when it fits. If you have the centroids you can set it by doing:
kmeans.cluster_centers_ = centroids_init
It should be able to fit after this.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧巷少年郎        
                
              
                            
                2021-01-15 03:05
              
            
            
                                                                       
Yes. Whether the sklearn.cluster.KMeans object is pickled or not (if you un-pickle it correctly, you'll be dealing with the "same" original object) does not affect that you can use the predict method to cluster a new observation. 

An example:

from sklearn.cluster import KMeans
from sklearn.externals import joblib

model = KMeans(n_clusters = 2, random_state = 100)
X = [[0,0,1,0], [1,0,0,1], [0,0,0,1],[1,1,1,0],[0,0,0,0]]
model.fit(X)


Out:

KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
    verbose=0)


Continue:

joblib.dump(model, 'model.pkl')  
model_loaded = joblib.load('model.pkl')

model_loaded


Out: 

KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
    verbose=0)


See how the n_clusters and random_state parameters are the same between the model and model_new objects? You're good to go. 

Predict with the "new" model:

model_loaded.predict([0,0,0,0])

Out[64]: array([0])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复