cost function in cv.glm of boot library in R

后端 未结 4 1054
忘掉有多难
忘掉有多难 2021-02-06 10:27

I am trying to use the crossvalidation cv.glm function from the boot library in R to determine the number of misclassifications when a glm logistic regression is applied.

4条回答
  •  心在旅途
    2021-02-06 10:59

    The cost function can optionally be defined if there is one you prefer over the default average squared error. If you wanted to do so then the you would write a function that returns the cost you want to minimize using two inputs: (1) the vector of known labels that you are predicting, and (2) the vector of predicted probabilities from your model for those corresponding labels. So for the cost function that (I think) you described in your post you are looking for a function that will return the average number of accurate classifications which would look something like this:

    cost <- function(labels,pred){
     mean(labels==ifelse(pred > 0.5, 1, 0))
    }
    

    With that function defined you can then pass it into your glm.cv() call. Although I wouldn't recommend using your own cost function over the default one unless you have reason to. Your example isn't reproducible, so here is another example:

    > library(boot)
    > 
    > cost <- function(labels,pred){
    +   mean(labels==ifelse(pred > 0.5, 1, 0))
    + }
    > 
    > #make model
    > nodal.glm <- glm(r ~ stage+xray+acid, binomial, data = nodal)
    > #run cv with your cost function
    > (nodal.glm.err <- cv.glm(nodal, nodal.glm, cost, nrow(nodal)))
    $call
    cv.glm(data = nodal, glmfit = nodal.glm, cost = cost, K = nrow(nodal))
    
    $K
    [1] 53
    
    $delta
    [1] 0.8113208 0.8113208
    
    $seed
      [1]         403         213 -2068233650  1849869992 -1836368725 -1035813431  1075589592  -782251898
    ...
    

提交回复
热议问题