Sklearn Pipeline: How to build for kmeans, clustering text?
问题 I have text as shown : list1 = ["My name is xyz", "My name is pqr", "I work in abc"] The above will be training set for clustering text using kmeans. list2 = ["My name is xyz", "I work in abc"] The above is my test set. I have built a vectorizer and the model as shown below: vectorizer = TfidfVectorizer(min_df = 0, max_df=0.5, stop_words = "english", charset_error = "ignore", ngram_range = (1,3)) vectorized = vectorizer.fit_transform(list1) km=KMeans(n_clusters=2, init='k-means++', n_init=10,