问题
I have observed a very strange behavior when tuning SVM parameters with caret
. When training a single model without tuning, SVM with radial basis kernel takes more time than SVM with linear kernel, which is expected. However, when tuning SVM with both kernels over the same penalty grid, SVM with linear kernel takes substantially more time than SVM with radial basis kernel. This behavior can be easily reproduced in both Windows and Linux with R 3.2 and caret
6.0-47. Does anyone know why tuning the linear SVM takes so much more time than the radial basis kernel SVM?
SVM linear
user system elapsed
0.51 0.00 0.52
SVM radial
user system elapsed
0.85 0.00 0.84
SVM linear tuning
user system elapsed
129.98 0.02 130.08
SVM radial tuning
user system elapsed
2.44 0.05 2.48
The toy example code is below:
library(data.table)
library(kernlab)
library(caret)
n <- 1000
p <- 10
dat <- data.table(y = as.factor(sample(c('p', 'n'), n, replace = T)))
dat[, (paste0('x', 1:p)) := lapply(1:p, function(x) rnorm(n, 0, 1))]
dat <- as.data.frame(dat)
sigmas <- sigest(as.matrix(dat[, -1]), na.action = na.omit, scaled = TRUE)
sigma <- mean(as.vector(sigmas[-2]))
cat('\nSVM linear\n')
print(system.time(fit1 <- train(y ~ ., data = dat, method = 'svmLinear', tuneLength = 1,
trControl = trainControl(method = 'cv', number = 3))))
cat('\nSVM radial\n')
print(system.time(fit2 <- train(y ~ ., data = dat, method = 'svmRadial', tuneLength = 1,
trControl = trainControl(method = 'cv', number = 3))))
cat('\nSVM linear tuning\n')
print(system.time(fit3 <- train(y ~ ., data = dat, method = 'svmLinear',
tuneGrid = expand.grid(C = 2 ^ seq(-5, 15, 5)),
trControl = trainControl(method = 'cv', number = 3))))
cat('\nSVM radial tuning\n')
print(system.time(fit4 <- train(y ~ ., data = dat, method = 'svmRadial',
tuneGrid = expand.grid(C = 2 ^ seq(-5, 15, 5), sigma = sigma),
trControl = trainControl(method = 'cv', number = 3))))
回答1:
After taking a look I don't believe the issue is with caret
, but rather with whats going on behind(way behind) the scenes with kernlab
.
As has been stated elsewhere on stack overflow SVM
itself is an intensive algorithm. The time complexity of SVM
is O(n*n). Now this doesn't account for the difference between SVM
calls. What does seems to be happening though is after the call to compiled C code through a very deep stack ending in SVM > .Local > .call.
(.call
being a call to compiled c code and out of my knowledge base). Most of the time when you see unexpected slow times moving from R
to C
its because how things are passed. Since your pulling in a matrix this lends itself further to the assumption of a naming or dimensions issue causing some extra work on the other end.
if we look at how this code is profiled the bottleneck becomes pretty clear.
Apologies about the font size -- its a deep stack and I think the overall shape tells the story more than the individual functions. Feel free to spam Ctrl + below.
nSVM_linear
looks like a healthy profile and lots of friendly R functions.
Same deal for nSVM radial
Now once we start with 'radial tuning' we start to see the flatter structure with the try-call
stacks starting to skew but everything seems to be executing quickly.
Whoa. Completely different structure for linear tuning C
calls taking over 100 seconds in some cases.
So that being said, it looks like your bottleneck is in the compiled C
code from kernlab
. Since the package is connecting to libsvm
which seems to be pretty efficient I can't imagine there an actual issue with the code being called. Actually identifying how(safety based feature or an input issue from R) and why the issue is occurring when moving from one to the other is a job for someone better than I.
回答2:
I ran into incredibly poor performance of svmRadial
on Linux. It turns out that the issue was with using multicore DoMC
. svmRadial
runs fine on a single core. The kernlab
functions are the only ones in caret
that exhibit this behaviour that I've seen. One more issue to add for kernlab
, in addition to those mentioned by others.
来源:https://stackoverflow.com/questions/30385347/r-caret-unusually-slow-when-tuning-svm-with-linear-kernel