Significance of 99% of variance covered by the first component in PCA

后端 未结 1 1175
死守一世寂寞
死守一世寂寞 2021-01-22 22:31

What does it mean/signify when the first component covers for more than 99% of the total variance in PCA analysis ? I have a feature vector of size 500X1000 on which I used Mat

相关标签:
1条回答
  • 2021-01-22 22:52

    The explained tells you how accurately you could represent the data by just using that principal component. In your case it means that just using the main principal component, you can describe very accurately (to a 99%) the data.

    Lets make a 2D example. Imagine you have data that is 100x2 and you do PCA.

    the result could be something like this (taken from the internets)

    enter image description here

    This data will give you an explained value for the first principal component (PCA 1st dimension big green arrow in the figure) of around 90%.

    What does it means?

    It means that if you project all your data to that line, you will reconstruct the points with 90% of accuracy (of course, you will loose the information in the PCA 2nd dimension direction).

    In your example, with 99% it visually means that almost all the points in blue are laying on the big green arrow, with very little variation in the small green arrow direction.

    Of course it is way more difficult to visualize with 1000 dimensions instead of 2, but I hope you understand.

    0 讨论(0)
提交回复
热议问题