Optimising caret for sensitivity still seems to optimise for ROC

后端 未结 1 1561
渐次进展
渐次进展 2021-01-21 15:09

I\'m trying to maximise sensitivity in my model selection in caret using rpart. To this end, I tried to replicate the method given here (scroll down to the example

相关标签:
1条回答
  • 2021-01-21 15:40

    You over-complicated things.

    Two class summary already contains Sensitivity as output. The column name "Sens". It is enough to specify:

    metric = "Sens" to train and summaryFunction = twoClassSummary to trainControl

    Full example:

    library(caret)
    library(mlbench)
    data(Sonar)
    
    rpart_caret_fit <- train(Class~., 
                             data = Sonar,
                             method = "rpart", 
                             tuneLength = 20, 
                             metric = "Sens", 
                             maximize = TRUE,
                             trControl = trainControl(classProbs = TRUE,
                                                      method = "cv",
                                                      number = 5,
                                                      summaryFunction = twoClassSummary))
    
    rpart_caret_fit
    CART 
    
    208 samples
     60 predictor
      2 classes: 'M', 'R' 
    
    No pre-processing
    Resampling: Cross-Validated (5 fold) 
    Summary of sample sizes: 167, 166, 166, 166, 167 
    Resampling results across tuning parameters:
    
      cp         ROC        Sens       Spec     
      0.0000000  0.7088298  0.7023715  0.7210526
      0.0255019  0.7075400  0.7292490  0.6684211
      0.0510038  0.7105388  0.7758893  0.6405263
      0.0765057  0.6904202  0.7841897  0.6294737
      0.1020076  0.7104681  0.8114625  0.6094737
      0.1275095  0.7104681  0.8114625  0.6094737
      0.1530114  0.7104681  0.8114625  0.6094737
      0.1785133  0.7104681  0.8114625  0.6094737
      0.2040152  0.7104681  0.8114625  0.6094737
      0.2295171  0.7104681  0.8114625  0.6094737
      0.2550190  0.7104681  0.8114625  0.6094737
      0.2805209  0.7104681  0.8114625  0.6094737
      0.3060228  0.7104681  0.8114625  0.6094737
      0.3315247  0.7104681  0.8114625  0.6094737
      0.3570266  0.7104681  0.8114625  0.6094737
      0.3825285  0.7104681  0.8114625  0.6094737
      0.4080304  0.7104681  0.8114625  0.6094737
      0.4335323  0.7104681  0.8114625  0.6094737
      0.4590342  0.6500135  0.8205534  0.4794737
      0.4845361  0.6500135  0.8205534  0.4794737
    
    Sens was used to select the optimal model using the largest value.
    The final value used for the model was cp = 0.4845361.
    

    Additionally I do not think you can specify control = rpart.control(maxdepth = 6) to caret train. This is not correct - caret passes any parameters forward using .... So you can pass pretty much any argument.

    If you are looking to write you own summary functions here is an example on the "Sens":

    Sensitivity.fc <- function (data, lev = NULL, model = NULL) { #every summary function takes these three arguments
      obs <- data[, "obs"] #these are the real values - always in column name "obs" in data
      cls <- levels(obs) #there are the levels - you can also pass this to lev argument 
      probs <- data[, cls[2]] #these are the probabilities for the 2nd class - useful only if prob = TRUE
      class <- as.factor(ifelse(probs > 0.5, cls[2], cls[1])) #calculate the classes based on some probability treshold
      Sensitivity <- caret::sensitivity(class, obs) #do the calculation - I was lazy so I used a built in function to do it for me
      names(Sensitivity) <- "Sens" #the name of the output
      Sensitivity
    }
    

    and now:

    rpart_caret_fit <- train(Class~., 
                             data = Sonar,
                             method = "rpart", 
                             tuneLength = 20, 
                             metric = "Sens", #because of this line: names(Sensitivity) <- "Sens" 
                             maximize = TRUE,
                             trControl = trainControl(classProbs = TRUE,
                                                      method = "cv",
                                                      number = 5,
                                                      summaryFunction = Sensitivity.fc))
    

    Lets check if both produce the same results:

    set.seed(1)
    fit_sens <- train(Class~., 
                      data = Sonar,
                      method = "rpart", 
                      tuneLength = 20, 
                      metric = "Sens", 
                      maximize = TRUE,
                      trControl = trainControl(classProbs = TRUE,
                                               method = "cv",
                                               number = 5,
                                               summaryFunction = Sensitivity.fc))
    
    set.seed(1)
    fit_sens2 <- train(Class~., 
                       data = Sonar,
                       method = "rpart", 
                       tuneLength = 20, 
                       metric = "Sens", 
                       maximize = TRUE,
                       trControl = trainControl(classProbs = TRUE,
                                                method = "cv",
                                                number = 5,
                                                summaryFunction = twoClassSummary))
    
    all.equal(fit_sens$results[c("cp", "Sens")],
              fit_sens2$results[c("cp", "Sens")])  
    
    TRUE
    
    all.equal(fit_sens$bestTune,
              fit_sens2$bestTune)
    TRUE
    
    0 讨论(0)
提交回复
热议问题