问题
Using the R MASS package to do a linear discriminant analysis, is there a way to get a measure of variable importance?
Library(MASS)
### import data and do some preprocessing
fit <- lda(cat~., data=train)
I have is a data set with about 20 measurements to predict a binary category. But the measurements are hard to obtain so I want to reduce the number of measurements to the most influential.
When using rpart or randomForests I can get a list of variable importance, or a gimi decrease stat using summary() or importance().
Is there a built in function to do this that I cant find? Or if I have to code one, what would be a good way to go about it?
回答1:
I would recommend to use the "caret" package.
library(caret)
data(mdrr)
mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)]
mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .8)]
set.seed(1)
inTrain <- createDataPartition(mdrrClass, p = .75, list = FALSE)[,1]
train <- mdrrDescr[ inTrain, ]
test <- mdrrDescr[-inTrain, ]
trainClass <- mdrrClass[ inTrain]
testClass <- mdrrClass[-inTrain]
set.seed(2)
ldaProfile <- rfe(train, trainClass,
sizes = c(1:10, 15, 30),
rfeControl = rfeControl(functions = ldaFuncs, method = "cv"))
postResample(predict(ldaProfile, test), testClass)
Once the variable "ldaProfile" is created you can retrieve the best subset of variables and its description:
ldaProfile$optVariables
[1] "X5v" "VRA1" "D.Dr06" "Wap" "G1" "Jhetm" "QXXm"
[8] "nAB" "H3D" "nR06" "TI2" "nBnz" "Xt" "VEA1"
[15] "TIE"
Also you can get a nice plot of used variables vs. Accuracy.
回答2:
One option would be to employ permutation importance.
Fit the LDA model then randomly permute each feature column with a different column chosen at random and compare the resulting prediction score with baseline (non-permuted) score.
The more the permuted score is reduced relative to the baseline score, the more important that feature is. Then you can select a cutoff and take only those features for which the permuted score - baseline score is above the given threshold.
There is a nice tutorial on kaggle for this topic. It uses python instead of R, but the concept is directly applicable here.
https://www.kaggle.com/dansbecker/permutation-importance
来源:https://stackoverflow.com/questions/23900932/linear-discriminant-analysis-variable-importance