Matlab Principal Component Analysis (eigenvalues order)

前端 未结 2 445
旧巷少年郎
旧巷少年郎 2020-12-22 14:01

I want to use the \"princomp\" function of Matlab but this function gives the eigenvalues in a sorted array. This way I can\'t find out to which column corresponds which eig

2条回答
  •  生来不讨喜
    2020-12-22 14:42

    With PCA, each principle component returned will be a linear combination of the original columns/dimensions. Perhaps an example might clear up any misunderstanding you have.

    Lets consider the Fisher-Iris dataset comprising of 150 instances and 4 dimensions, and apply PCA on the data. To make things easier to understand, I am first zero-centering the data before calling PCA function:

    load fisheriris
    X = bsxfun(@minus, meas, mean(meas));    %# so that mean(X) is the zero vector
    
    [PC score latent] = princomp(X);
    

    Lets look at the first returned principal component (1st column of PC matrix):

    >> PC(:,1)
          0.36139
        -0.084523
          0.85667
          0.35829
    

    This is expressed as a linear combination of the original dimensions, i.e.:

    PC1 =  0.36139*dim1 + -0.084523*dim2 + 0.85667*dim3 + 0.35829*dim4
    

    Therefore to express the same data in the new coordinates system formed by the principal components, the new first dimension should be a linear combination of the original ones according to the above formula.

    We can compute this simply as X*PC which is the exactly what is returned in the second output of PRINCOMP (score), to confirm this try:

    >> all(all( abs(X*PC - score) < 1e-10 ))
        1
    

    Finally the importance of each principal component can be determined by how much variance of the data it explains. This is returned by the third output of PRINCOMP (latent).


    We can compute the PCA of the data ourselves without using PRINCOMP:

    [V E] = eig( cov(X) );
    [E order] = sort(diag(E), 'descend');
    V = V(:,order);
    

    the eigenvectors of the covariance matrix V are the principal components (same as PC above, although the sign can be inverted), and the corresponding eigenvalues E represent the amount of variance explained (same as latent). Note that it is customary to sort the principal component by their eigenvalues. And as before, to express the data in the new coordinates, we simply compute X*V (should be the same as score above, if you make sure to match the signs)

提交回复
热议问题