Removing Variables using PCA in R

后端未结

关注

 1  1044

I tried searching for this but could not find the info. I am conducting a linear regression using 10 variables (1 y variable and 9 x variables). All the variables are correl

相关标签:

1条回答

温柔的废话

2021-01-16 04:20
So it sounds like you are facing a model selection problem, you want to choose the best variables without overfitting correct?

PCA may not be the way to go for feature selection, here's one discussion of it:

https://stats.stackexchange.com/questions/27300/using-pca-for-feature-selection

The usual purpose of PCA is dimensionality reduction, i.e. describing relationships in your data using fewer dimensions than are actually present. A component that explains a lot of variance could be a good feature but not necessarily, its not exactly geared towards that purpose.

If what you want to do is pare down the number of features in your model, I would suggest using an information criterion like the AIC. You can easily use this is R with the stepAIC function like so:
```
library(MASS)
fit = lm(Sepal.Length ~ .^2,data=iris)
step <- stepAIC(fit, direction="backward")
step$anova
>> Stepwise Model Path 
>> Analysis of Deviance Table
>> 
>> Initial Model:
>> Sepal.Length ~ (Sepal.Width + Petal.Length + Petal.Width + Species)^2
>> 
>> Final Model:
>> Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species + 
>>   Sepal.Width:Petal.Width + Petal.Length:Species + Petal.Width:Species
```
At each step it trims out another feature, minimizing on AIC. There is a lot more that goes into model selection, and a lot of things to consider and adjust, so this is not a proscriptive guide, just wanted to bring it up as something to consider.
0 讨论(0)
发布评论:

提交评论
- 加载中...