I used the below code to create k-means clusters using Scikit learn.
kmean = KMeans(n_clusters=nclusters,n_jobs=-1,random_state=2376,max_iter=1000,n_init=100
According to the Sklearn Kmeans documentation using predict(X, sample_weight=None)
after loading the pickle file with the stored Kmeans model, will predict the closest cluster each sample in X belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
Practical note!
Many times people tend to take the clusters labels from model.labels_
, however in this case of of prediction make sure to use the returned result, such as pred_y
in the as in the following example:
from sklearn.cluster import KMeans
import pickle
# load the model
model = pickle.load(open(filename, 'rb'))
# predict using the loaded model
pred_y = model.predict(X)
This question is a bit old, but kmeans sets a cluster_centers_ parameter when it fits. If you have the centroids you can set it by doing:
kmeans.cluster_centers_ = centroids_init
It should be able to fit after this.
Yes. Whether the sklearn.cluster.KMeans
object is pickled or not (if you un-pickle it correctly, you'll be dealing with the "same" original object) does not affect that you can use the predict
method to cluster a new observation.
An example:
from sklearn.cluster import KMeans
from sklearn.externals import joblib
model = KMeans(n_clusters = 2, random_state = 100)
X = [[0,0,1,0], [1,0,0,1], [0,0,0,1],[1,1,1,0],[0,0,0,0]]
model.fit(X)
Out:
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
verbose=0)
Continue:
joblib.dump(model, 'model.pkl')
model_loaded = joblib.load('model.pkl')
model_loaded
Out:
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
verbose=0)
See how the n_clusters
and random_state
parameters are the same between the model
and model_new
objects? You're good to go.
Predict with the "new" model:
model_loaded.predict([0,0,0,0])
Out[64]: array([0])