scikit-learn PCA: matrix transformation produces PC estimates with flipped signs

后端 未结 2 2002
不思量自难忘°
不思量自难忘° 2021-01-05 12:46

I\'m using scikit-learn to perform PCA on this dataset. The scikit-learn documentation states that

Due to implementation subtleties of the Singular Va

相关标签:
2条回答
  • 2021-01-05 13:28

    SVD decompositions are not guaranteed unique - only the values will be identical, as different implementations of svd() can produce different signs. Any of the eigenvectors can have flipped signs, and will produce identical results when transformed, then transformed back into the original space. Most algorithms in sklearn which use SVD decomposition use the function sklearn.utils.extmath.svd_flip() to correct this, and enforce an identical convention across algorithms. For historical reasons, PCA() never got this fix (though maybe it should...)

    In general, this is not something to worry about - just a limitation of the SVD algorithm as typically implemented.

    On an additional note, I find assigning importance to PC weights (and parameter weights in general) dangerous, because of exactly these kinds of issues. Numerical/implementation details should not influence your analysis results, but many times it is hard to tell what is a result of the data, and what is a result of the algorithms you use for exploration. I know this is a homework assignment, not a choice, but it is important to keep these things in mind!

    0 讨论(0)
  • 2021-01-05 13:31

    You're doing nothing wrong.

    What the documentation is warning you about is that repeated calls to fit may yield different principal components - not how they relate to another PCA implementation.

    Having a flipped sign on all components doesn't make the result wrong - the result is right as long as it fulfills the definition (each component is chosen such that it captures the maximum amount of variance in the data). As it stands, it seems the projection you got is simply mirrored - it still fulfills the definition, and is, thus, correct.

    If, beneath correctness, you're worried about consistency between implementations, you can simply multiply the components by -1, when it's necessary.

    0 讨论(0)
提交回复
热议问题