Removing Variables using PCA in R

后端 未结 1 1042
再見小時候
再見小時候 2021-01-16 04:06

I tried searching for this but could not find the info. I am conducting a linear regression using 10 variables (1 y variable and 9 x variables). All the variables are correl

相关标签:
1条回答
  • 2021-01-16 04:20

    So it sounds like you are facing a model selection problem, you want to choose the best variables without overfitting correct?

    PCA may not be the way to go for feature selection, here's one discussion of it:

    https://stats.stackexchange.com/questions/27300/using-pca-for-feature-selection

    The usual purpose of PCA is dimensionality reduction, i.e. describing relationships in your data using fewer dimensions than are actually present. A component that explains a lot of variance could be a good feature but not necessarily, its not exactly geared towards that purpose.

    If what you want to do is pare down the number of features in your model, I would suggest using an information criterion like the AIC. You can easily use this is R with the stepAIC function like so:

    library(MASS)
    fit = lm(Sepal.Length ~ .^2,data=iris)
    step <- stepAIC(fit, direction="backward")
    step$anova
    >> Stepwise Model Path 
    >> Analysis of Deviance Table
    >> 
    >> Initial Model:
    >> Sepal.Length ~ (Sepal.Width + Petal.Length + Petal.Width + Species)^2
    >> 
    >> Final Model:
    >> Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species + 
    >>   Sepal.Width:Petal.Width + Petal.Length:Species + Petal.Width:Species
    

    At each step it trims out another feature, minimizing on AIC. There is a lot more that goes into model selection, and a lot of things to consider and adjust, so this is not a proscriptive guide, just wanted to bring it up as something to consider.

    0 讨论(0)
提交回复
热议问题