how to find important factors in support vector machine

时光毁灭记忆、已成空白 提交于 2021-01-28 03:23:26

问题


The original data are large, so I cannot post it here. The question is that I use the package e1071 in R to do the support vector machine analysis. The original data have 100 factors and the prediction results is 1 or 0. for example, I generate a random data frame with 10 factors.

for (i in 1:10){
    factor<-c(factor,runif(10,5,10))
}
value<-matrix(factor,nrow=10)
y<-sample(0:1,10,replace=T)
data<-as.data.frame(cbind(y,value))

I did the prediction pard, but I wonder how to determine which factors (among the 10 factors) are important (more related) to the results.

For example, The result might be factor 2,4,5, and 10 are contribute to the final results.

Can you help me with this? Thank you so much.


回答1:


A complete answer to this question is not simple. Here is an example for getting started on this subject:

library(rpart)
library(e1071)

cat('Regression tree case:\n')
fit1 <- rpart(Species ~ ., data=iris)
print(fit1$variable.importance)

cat('SVM model case:\n')
fit2 <- svm(Species ~ ., data = iris)
w <- t(fit2$coefs) %*% fit2$SV                 # weight vectors
w <- apply(w, 2, function(v){sqrt(sum(v^2))})  # weight
w <- sort(w, decreasing = T)
print(w)

Result of script above is:

Regression tree case:
 Petal.Width Petal.Length Sepal.Length  Sepal.Width 
    88.96940     81.34496     54.09606     36.01309 

SVM model case:
Petal.Length  Petal.Width Sepal.Length  Sepal.Width 
   12.160093    11.737364     6.623965     4.722632 

You can see the result variable importance of two models are similar.

This is one of many methods of interpreting SVM results.

See following paper for more information: "An Introduction to Variable and Feature Selection", http://jmlr.csail.mit.edu/papers/v3/guyon03a.html



来源:https://stackoverflow.com/questions/34781495/how-to-find-important-factors-in-support-vector-machine

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!