I used the below code to create k-means clusters using Scikit learn.
kmean = KMeans(n_clusters=nclusters,n_jobs=-1,random_state=2376,max_iter=1000,n_init=100
According to the Sklearn Kmeans documentation using predict(X, sample_weight=None)
after loading the pickle file with the stored Kmeans model, will predict the closest cluster each sample in X belongs to.
In the vector quantization literature, cluster_centers_ is called the code book and each value returned by predict is the index of the closest code in the code book.
Practical note!
Many times people tend to take the clusters labels from model.labels_
, however in this case of of prediction make sure to use the returned result, such as pred_y
in the as in the following example:
from sklearn.cluster import KMeans
import pickle
# load the model
model = pickle.load(open(filename, 'rb'))
# predict using the loaded model
pred_y = model.predict(X)