PCA的思想是将n维特征映射到K维上(k < n),这k维是全新的正交特征。这k维特征成为主成分,是重新构造出来的k维特征,而不是简单的从n维特征中去除其余 n-k维特征。
(1)计算数据的协方差矩阵:
https://blog.csdn.net/Mr_HHH/article/details/78490576
(2) 计算数据协方差矩阵的特征值和特征向量
python 样例代码:
# coding:UTF-8
import os
import numpy as np
# 原始的数据
x = [0.69, -1.31, 0.39, 0.09, 1.29, 0.49, 0.19, -0.81, -0.31, -0.71]
y = [0.49, -1.21, 0.99, 0.29, 1.09, 0.79, -0.31,-0.81, -0.31, -1.01]
npx = np.array(x)
npy = np.array(y)
# 去除均值
ma = np.matrix([x - npx.mean(), y - npy.mean()])
print(u"协方差矩阵")
cov = ma.dot(ma.T)
print("------------------下面计算原始矩阵的特征值和特征向量-----------------------")
eigenvalue,featurevector = np.linalg.eig(cov)
print ("原始矩阵的特征值")
print ("eigenvalue=", eigenvalue)
print ("featurevector=", featurevector)
#选取较大的特征值对应的特征向量
max_featurevector = np.matrix([-0.6778734, -0.73517866])
# 求解pca
pca_data = ma.T * max_featurevector.T
print ("pca")
print(pca_data)
输出:
协方差矩阵
------------------下面计算原始矩阵的特征值和特征向量-----------------------
原始矩阵的特征值
('eigenvalue=', array([ 0.44175059, 11.55624941]))
('featurevector=', matrix([[-0.73517866, -0.6778734 ],
[ 0.6778734 , -0.73517866]]))
pca
[[-0.82797019]
[ 1.77758033]
[-0.9921975 ]
[-0.27421042]
[-1.67580143]
[-0.91294911]
[ 0.09910944]
[ 1.14457217]
[ 0.43804614]
[ 1.22382056]]
来源:CSDN
作者:蓝鲸123
链接:https://blog.csdn.net/TH_NUM/article/details/103845005