(in R) Why is result of ksvm using user-defined linear kernel different from that of ksvm using “vanilladot”?

后端 未结 1 1663
渐次进展
渐次进展 2021-01-18 19:04

I wanted to use user-defined kernel function for Ksvm in R. so, I tried to make a vanilladot kernel and compare with \"vanilladot\" which is built in \"kernlab\" as practice

相关标签:
1条回答
  • 2021-01-18 19:45

    First, it seems like a really good question!

    Now to the point. In the sources of ksvm we can find when is a line drawn between using user-defined kernel, and the built-ins:

     if (type(ret) == "spoc-svc") {
                if (!is.null(class.weights)) 
                    weightedC <- class.weights[weightlabels] * rep(C, 
                      nclass(ret))
                else weightedC <- rep(C, nclass(ret))
                yd <- sort(y, method = "quick", index.return = TRUE)
                xd <- matrix(x[yd$ix, ], nrow = dim(x)[1])
                count <- 0
                if (ktype == 4) 
                    K <- kernelMatrix(kernel, x)
                resv <- .Call("tron_optim", as.double(t(xd)), as.integer(nrow(xd)), 
                    as.integer(ncol(xd)), as.double(rep(yd$x - 1, 
                      2)), as.double(K), as.integer(if (sparse) xd@ia else 0), 
                    as.integer(if (sparse) xd@ja else 0), as.integer(sparse), 
                    as.integer(nclass(ret)), as.integer(count), as.integer(ktype), 
                    as.integer(7), as.double(C), as.double(epsilon), 
                    as.double(sigma), as.integer(degree), as.double(offset), 
                    as.double(C), as.double(2), as.integer(0), as.double(0), 
                    as.integer(0), as.double(weightedC), as.double(cache), 
                    as.double(tol), as.integer(10), as.integer(shrinking), 
                    PACKAGE = "kernlab")
                reind <- sort(yd$ix, method = "quick", index.return = TRUE)$ix
                alpha(ret) <- t(matrix(resv[-(nclass(ret) * nrow(xd) + 
                    1)], nclass(ret)))[reind, , drop = FALSE]
                coef(ret) <- lapply(1:nclass(ret), function(x) alpha(ret)[, 
                    x][alpha(ret)[, x] != 0])
                names(coef(ret)) <- lev(ret)
                alphaindex(ret) <- lapply(sort(unique(y)), function(x)
    which(alpha(ret)[, 
                    x] != 0))
                xmatrix(ret) <- x
                obj(ret) <- resv[(nclass(ret) * nrow(xd) + 1)]
                names(alphaindex(ret)) <- lev(ret)
                svindex <- which(rowSums(alpha(ret) != 0) != 0)
                b(ret) <- 0
                param(ret)$C <- C
            }
    

    The important parts are two things, first, if we provide ksvm with our own kernel, then ktype=4 (while for vanillakernel, ktype=0) so it makes two changes:

    • in case of user-defined kernel, the kernel matrix is computed instead of actually using the kernel
    • tron_optim routine is ran with the information regarding the kernel

    Now, in the svm.cpp we can find the tron routines, and in the tron_run (called from tron_optim), that LINEAR kernel has a separate optimization routine

    if (param->kernel_type == LINEAR)
        {
         /* lots of code here */
         while (Cpj < Cp)
           {
           totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w, 
                                Cpj, Cnj, param->eps, sii, param->shrinking, 
                                param->qpsize);
         /* lots of code here */
           }
         totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w, Cp, Cn,
                              param->eps, sii, param->shrinking, param->qpsize);
         delete[] w;
        }
    else
    {    
        Solver_B s;
        s.Solve(l, BSVC_Q(*prob,*param,y), minus_ones, y, alpha, Cp, Cn, 
        param->eps, sii, param->shrinking, param->qpsize);
    }
    

    As you can see, the linear case is treated in the more complex, more detailed way. There is an inner optimization loop calling the solver many times. It would require really deep analysis of actual optimization being performed here, but at this step one can answer your question in a following way:

    • There is no error in your operation
    • kernlab's svm has a separate routine for training SVM with linear kernel, which is based on the type of kernel passed to the code, changing "kernel" to "vanillakernel" made the ksvm think it is actually working with vanillakernel, and so performed this separate optimization routine
    • It does not seem as a bug in fact, as the linear SVM is in fact very different from the kernelized version in terms of efficient optimization techniques. Amount of heuristic as well as numerical issues that has to be taken care of is really big. As a result, some approximations are required and can lead to the different results. While for the rich feature space (like those induced by RBF kernel) it should not really matter, for simple kernels line linear ones - this simplifications can lead to significant output changes.
    0 讨论(0)
提交回复
热议问题