Reveal k-modes cluster features

五迷三道 提交于 2020-01-22 06:00:46

问题


I'm performing a cluster analysis on categorical data, hence using k-modes approach.

My data is shaped as a preference survey: How do you like hair and eyes?

The respondent can pick up an answers from a fixed (multiple choice) set of 4 possibility.

I therefore get the dummies, apply k-modes, attach the clusters back to the initial df and then plot them in 2D with pca.

My code looks like:

import numpy as np
import pandas as pd
from kmodes import kmodes

df_dummy = pd.get_dummies(df)

#transform into numpy array
x = df_dummy.reset_index().values

km = kmodes.KModes(n_clusters=3, init='Huang', n_init=5, verbose=0)
clusters = km.fit_predict(x)
df_dummy['clusters'] = clusters


import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
pca = PCA(2)

# Turn the dummified df into two columns with PCA
plot_columns = pca.fit_transform(df_dummy.ix[:,0:12])

# Plot based on the two dimensions, and shade by cluster label
plt.scatter(x=plot_columns[:,1], y=plot_columns[:,0], c=df_dummy["clusters"], s=30)
plt.show()

and I can visualize:

Now my problem is: Can somehow reveal the distinctive feature of each cluster? ie, what are the main characteristics (maybe blond hair and blue eyes) of the group of green dots in the scatterplot?

I get the clustering has happened, but I can't find a way to translate what the clustering actually means.

Should I play with the .labels_ object?


回答1:


Take a look at km.cluster_centroids_. This will give the mode of each variable for each cluster.



来源:https://stackoverflow.com/questions/41827660/reveal-k-modes-cluster-features

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!