PCA主成分分析应用

主成分分析
PCA降维
Notes：
KNN(K-NearstNeighor)有监督算法（近邻个数）；
KMeans无监督算法（最终聚类的个数/分成K类）
决策边界：
datasets:
- 数据集载入：load_digits()
  - .data / .target / .target_names
  - .images：张数1792 X 每张尺寸（8X8）
PCA降维：
- fit_transform()返回降维后的数据
- fit()仅返回模型参数

可视化：

灰度图：plt.imshow(image,cmap=plt.cm.gray_r)

手写数字识别聚类：

 #手写数字数据集 1797张 8X8
 from sklearn import decomposition
 from sklearn.cluster import KMeans
 digits_data = datasets.load_digits()		#载入数据集
 X = digits_data.data 		#X.shape=>(1797,64)
 y = digits_data.target
 #降维
 estimator = decomposition.PCA(n_components=2)
 reduce_data = estimator.fit_transform(X)
 #训练
 model = KMeans(n_clusters=10).fit(reduce_data)
 #坐标网格矩阵
 x_min,x_max  = reduce_data[:,0].min() -1,reduce_data[:,0].max() +1
 y_min,y_max = reduce_data[:,1].min() -1,reduce_data[:,1].max() +1 
 xx,yy = np.meshgrid(np.arange(x_min,x_max,.05),np.arange(y_min,y_max,.05))
 #预测，结果可视
 result = model.predict(np.c_[xx.ravel(),yy.ravel()])
 result = result.reshape(xx.shape)
 plt.figure(figsize=(10,5))
 plt.contourf(xx,yy,result,cmap=plt.cm.Greys)
 plt.scatter(reduce_data[:,0],reduce_data[:,1],c=y,s=15)
 center = model.cluster_centers_
 plt.scatter(center[:,0],center[:,1],marker='p',lw=2,color='b',edgecolors='w',zorder=20)
 plt.xlim(x_min,x_max),plt.ylim(y_min,y_max)

在这里插入图片描述
评估：

随机森林分类
交叉验证（cv = 5）

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
estimator = decomposition.PCA(n_components=5) # 从 10 个特征缩减为 5 个特征
X_pca = estimator.fit_transform(X)
model = RandomForestClassifier()
cross_val_score(model,X,y,cv=5).mean()

总结： 原始数据集PCA降维（/数据预处理）-聚类

来源：CSDN

作者：啊嘞随便什么昵称都可以

链接：https://blog.csdn.net/Romaga/article/details/104718747

标签

pca

主成分分析

数据降维