问题
The original data are large, so I cannot post it here. The question is that I use the package e1071 in R to do the support vector machine analysis. The original data have 100 factors and the prediction results is 1 or 0. for example, I generate a random data frame with 10 factors.
for (i in 1:10){
factor<-c(factor,runif(10,5,10))
}
value<-matrix(factor,nrow=10)
y<-sample(0:1,10,replace=T)
data<-as.data.frame(cbind(y,value))
I did the prediction pard, but I wonder how to determine which factors (among the 10 factors) are important (more related) to the results.
For example, The result might be factor 2,4,5, and 10 are contribute to the final results.
Can you help me with this? Thank you so much.
回答1:
A complete answer to this question is not simple. Here is an example for getting started on this subject:
library(rpart)
library(e1071)
cat('Regression tree case:\n')
fit1 <- rpart(Species ~ ., data=iris)
print(fit1$variable.importance)
cat('SVM model case:\n')
fit2 <- svm(Species ~ ., data = iris)
w <- t(fit2$coefs) %*% fit2$SV # weight vectors
w <- apply(w, 2, function(v){sqrt(sum(v^2))}) # weight
w <- sort(w, decreasing = T)
print(w)
Result of script above is:
Regression tree case:
Petal.Width Petal.Length Sepal.Length Sepal.Width
88.96940 81.34496 54.09606 36.01309
SVM model case:
Petal.Length Petal.Width Sepal.Length Sepal.Width
12.160093 11.737364 6.623965 4.722632
You can see the result variable importance of two models are similar.
This is one of many methods of interpreting SVM results.
See following paper for more information: "An Introduction to Variable and Feature Selection", http://jmlr.csail.mit.edu/papers/v3/guyon03a.html
来源:https://stackoverflow.com/questions/34781495/how-to-find-important-factors-in-support-vector-machine