How to assign an new observation to existing Kmeans clusters based on nearest cluster centriod logic in python?

后端 未结 3 838
长发绾君心
长发绾君心 2021-01-15 02:28

I used the below code to create k-means clusters using Scikit learn.

kmean = KMeans(n_clusters=nclusters,n_jobs=-1,random_state=2376,max_iter=1000,n_init=100         


        
相关标签:
3条回答
  • 2021-01-15 03:02

    According to the Sklearn Kmeans documentation using predict(X, sample_weight=None) after loading the pickle file with the stored Kmeans model, will predict the closest cluster each sample in X belongs to.

    In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.

    Practical note!
    Many times people tend to take the clusters labels from model.labels_, however in this case of of prediction make sure to use the returned result, such as pred_y in the as in the following example:

        from sklearn.cluster import KMeans
        import pickle
    
        # load the model
        model = pickle.load(open(filename, 'rb'))
    
        # predict using the loaded model
        pred_y = model.predict(X)
    
    0 讨论(0)
  • 2021-01-15 03:03

    This question is a bit old, but kmeans sets a cluster_centers_ parameter when it fits. If you have the centroids you can set it by doing:

    kmeans.cluster_centers_ = centroids_init

    It should be able to fit after this.

    0 讨论(0)
  • 2021-01-15 03:05

    Yes. Whether the sklearn.cluster.KMeans object is pickled or not (if you un-pickle it correctly, you'll be dealing with the "same" original object) does not affect that you can use the predict method to cluster a new observation.

    An example:

    from sklearn.cluster import KMeans
    from sklearn.externals import joblib
    
    model = KMeans(n_clusters = 2, random_state = 100)
    X = [[0,0,1,0], [1,0,0,1], [0,0,0,1],[1,1,1,0],[0,0,0,0]]
    model.fit(X)
    

    Out:

    KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
        n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
        verbose=0)
    

    Continue:

    joblib.dump(model, 'model.pkl')  
    model_loaded = joblib.load('model.pkl')
    
    model_loaded
    

    Out:

    KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
        n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
        verbose=0)
    

    See how the n_clusters and random_state parameters are the same between the model and model_new objects? You're good to go.

    Predict with the "new" model:

    model_loaded.predict([0,0,0,0])
    
    Out[64]: array([0])
    
    0 讨论(0)
提交回复
热议问题