cluster-analysis

How to make TF-IDF matrix dense?

被刻印的时光 ゝ 提交于 2020-08-17 04:58:22
问题 I am using TfidfVectorizer to convert a collection of raw documents to a matrix of TF-IDF features, which I then plan to input into a k-means algorithm (which I will implement). In that algorithm I will have to compute distances between centroids (categories of articles) and data points (articles). I am going to use Euclidean distance, so I need these two entities to be of same dimension, in my case max_features . Here is what I have: tfidf = TfidfVectorizer(max_features=10, strip_accents=

scikit-learn: Finding the features that contribute to each KMeans cluster

孤街浪徒 提交于 2020-07-04 06:25:12
问题 Say you have 10 features you are using to create 3 clusters. Is there a way to see the level of contribution each of the features have for each of the clusters? What I want to be able to say is that for cluster k1, features 1,4,6 were the primary features where as cluster k2's primary features were 2,5,7. This is the basic setup of what I am using: k_means = KMeans(init='k-means++', n_clusters=3, n_init=10) k_means.fit(data_features) k_means_labels = k_means.labels_ 回答1: You can use Principle