How to specify a validation holdout set to caret

前端 未结 2 384
耶瑟儿~
耶瑟儿~ 2021-01-03 07:46

I really like using caret for at least the early stages of modeling, especially for it\'s really easy to use resampling methods. However, I\'m working on a model where the t

相关标签:
2条回答
  • 2021-01-03 08:29

    Take a look at trainControl. There are now options to directly specify the rows of the data that are used to model the data (the index argument) and which rows should be used to compute the hold-out estimates (called indexOut). I think that does what you are looking for.

    Max

    0 讨论(0)
  • 2021-01-03 08:35

    I think I may've found a work-around for this but I'm not 100% that it is doing what I want and I am still hoping that someone comes up with something a bit more elegant. Anyway, I realized that it probably makes the most sense to include the validation set inside my training set and just define the resampling measures to only use the validation data. I think this should do the trick for the example above:

    > library(caret)
    > set.seed(1)
    > 
    > #training/validation set indices
    > i <- sample(150,50) #note - I no longer need to explictly create train/validation sets
    > 
    > #explicity define the cross-validation indices to be those from the validation set
    > tc <- trainControl(method="cv",number=1,index=list(Fold1=(1:150)[-i]),savePredictions=T)
    > (model.rf <- train(Species ~ ., data=iris,method="rf",trControl=tc))
    150 samples
      4 predictors
      3 classes: 'setosa', 'versicolor', 'virginica' 
    
    No pre-processing
    Resampling: Cross-Validation (1 fold) 
    
    Summary of sample sizes: 100 
    
    Resampling results across tuning parameters:
    
      mtry  Accuracy  Kappa
      2     0.94      0.907
      3     0.94      0.907
      4     0.94      0.907
    
    Accuracy was used to select the optimal model using  the largest value.
    The final value used for the model was mtry = 2. 
    > 
    > #i think this worked because the resampling indices line up?
    > all(sort(unique(model.rf$pred$rowIndex)) == sort(i))
    [1] TRUE
    > #exact contingency from above also indicate that this works
    > table(model.rf$pred[model.rf$pred$.mtry==model.rf$bestTune[[1]],c("obs","pred")])
                pred
    obs          setosa versicolor virginica
      setosa         17          0         0
      versicolor      0         20         2
      virginica       0          1        10
    
    0 讨论(0)
提交回复
热议问题