mlr3 PipeOps: Create branches with different data transformations and benchmark different learners within and between branches

问题

I'd like use PipeOps to train a learner on three alternative transformations of a dataset:

No transformation.
Class balancing- down.
Class balancing- up.

Then, I'd like to benchmark the three learned models.

My idea was to set up the pipeline as follows:

Make pipeline: Input -> Impute dataset (optional) -> Branch -> Split into the three branches described above -> Add the learner within each branch -> Unbranch.
Train pipeline and hope (that's where I'm getting it wrong) that the will be a result saved for each learner within each branch.

Unfortunately, following these steps results in a single learner that seems to have 'merged' everything from the different branches. I was hoping to get a list of length 3, but I get a list of length one instead.

R code:

library(data.table)
library(paradox)
library(mlr3)
library(mlr3filters)
library(mlr3learners)
library(mlr3misc)
library(mlr3pipelines)
library(mlr3tuning)
library(mlr3viz)

learner <- lrn("classif.rpart", predict_type = "prob")
learner$param_set$values <- list(
  cp = 0,
  maxdepth = 21,
  minbucket = 12,
  minsplit = 24
)

graph = 
  po("imputehist") %>>%
  po("branch", c("nop", "classbalancing_up", "classbalancing_down")) %>>%
  gunion(list(
    po("nop", id = "null"),
    po("classbalancing", id = "classbalancing_down", ratio = 2, reference = 'minor'), 
    po("classbalancing", id = "classbalancing_up", ratio = 2, reference = 'major')
  )) %>>%
  gunion(list(
    po("learner", learner, id = "learner_null"),
    po("learner", learner, id = "learner_classbalancing_down"),
    po("learner", learner, id = "learner_classbalancing_up")
  )) %>>%
  po("unbranch")

plot(graph)

tr <- mlr3::resample(tsk("iris"), graph, rsmp("holdout"))

tr$learners

Question 1 How can I get three different results instead?

Question 2 How can I benchmark these three results within the pipeline following unbranching?

Question 3 What if I want to add multiple learners within each branch? I'd like some of the learners to be inserted with fixed hyperparameters, while for others I'd like to have their hyperparameters tuned with AutoTuner within each branch. Then, I'd like to benchmark them within each branch and select the 'best' from each branch. Finally, I'd like to benchmark the three best learners to end up with the single best.

Many thanks.

回答1:

I think that I've found the answer to what I'm looking for. In brief, what I'd like to do is:

Create a graph pipeline with multiple learners. I'd like some of the learners to be inserted with fixed hyperparameters, while for others I'd like to have their hyperparameters tuned. Then, I'd like to benchmark them and select the 'best' one. I'd also like the benchmarking of learners to happen under different class balancing strategies, namely, do nothing, up-sample and down-sample. The optimal parameter settings for the up/down-sampling (e.g. ratio) would also be determined during tuning.

Two examples below, one that almost does what I want, the other doing exactly what I want.

Example 1: Build a pipe that includes all learners, that is, learners with fixed hyperparameters, as well as learners whose hyperparameters require tuning

As will be shown, it seems like a bad idea to have both kinds of learners (i.e. with fixed and tunable hyperparameters), because tuning the pipe disregards the learners with tunable hyperparameters.

####################################################################################
# Build Machine Learning pipeline that:
# 1. Imputes missing values (optional).
# 2. Tunes and benchmarks a range of learners.
# 3. Handles imbalanced data in different ways.
# 4. Identifies optimal learner for the task at hand.

# Abbreviations
# 1. td: Tuned. Learner already tuned with optimal hyperparameters, as found empirically by Probst et al. (2009). See http://jmlr.csail.mit.edu/papers/volume20/18-444/18-444.pdf
# 2. tn: Tuner. Optimal hyperparameters for the learner to be determined within the Tuner.
# 3. raw: Raw dataset in that class imbalances were not treated in any way.
# 4. up: Data upsampling to balance class imbalances.
# 5. down: Data downsampling to balance class imbalances.

# References
# Probst et al. (2009). http://jmlr.csail.mit.edu/papers/volume20/18-444/18-444.pdf
####################################################################################

task <- tsk('sonar')

# Indices for splitting data into training and test sets
train.idx <- task$data() %>%
  select(Class) %>%
  rownames_to_column %>%
  group_by(Class) %>%
  sample_frac(2 / 3) %>% # Stratified sample to maintain proportions between classes.
  ungroup %>%
  select(rowname) %>%
  deframe %>%
  as.numeric
test.idx <- setdiff(seq_len(task$nrow), train.idx)

# Define training and test sets in task format
task_train <- task$clone()$filter(train.idx)
task_test  <- task$clone()$filter(test.idx)

# Define class balancing strategies
class_counts <- table(task_train$truth())
upsample_ratio <- class_counts[class_counts == max(class_counts)] / 
  class_counts[class_counts == min(class_counts)]
downsample_ratio <- 1 / upsample_ratio

# 1. Enrich minority class by factor 'ratio'
po_over <- po("classbalancing", id = "up", adjust = "minor", 
              reference = "minor", shuffle = FALSE, ratio = upsample_ratio)

# 2. Reduce majority class by factor '1/ratio'
po_under <- po("classbalancing", id = "down", adjust = "major", 
               reference = "major", shuffle = FALSE, ratio = downsample_ratio)

# 3. No class balancing
po_raw <- po("nop", id = "raw") # Pipe operator for 'do nothing' ('nop'), i.e. don't up/down-balance the classes.

# We will be using an XGBoost learner throughout with different hyperparameter settings.

# Define XGBoost learner with the optimal hyperparameters of Probst et al.
# Learner will be added to the pipeline later on, in conjuction with and without class balancing.
xgb_td <- lrn("classif.xgboost", predict_type = 'prob')
xgb_td$param_set$values <- list(
  booster = "gbtree", 
  nrounds = 2563, 
  max_depth = 11, 
  min_child_weight = 1.75, 
  subsample = 0.873, 
  eta = 0.052,
  colsample_bytree = 0.713,
  colsample_bylevel = 0.638,
  lambda = 0.101,
  alpha = 0.894
)

xgb_td_raw <- GraphLearner$new(
  po_raw %>>%
    po('learner', xgb_td, id = 'xgb_td'),
  predict_type = 'prob'
)

xgb_tn_raw <- GraphLearner$new(
  po_raw %>>%
    po('learner', lrn("classif.xgboost",
                      predict_type = 'prob'), id = 'xgb_tn'),
  predict_type = 'prob'
)

xgb_td_up <- GraphLearner$new(
  po_over %>>%
    po('learner', xgb_td, id = 'xgb_td'),
  predict_type = 'prob'
)

xgb_tn_up <- GraphLearner$new(
  po_over %>>%
    po('learner', lrn("classif.xgboost",
                      predict_type = 'prob'), id = 'xgb_tn'),
  predict_type = 'prob'
)

xgb_td_down <- GraphLearner$new(
  po_under %>>%
    po('learner', xgb_td, id = 'xgb_td'),
  predict_type = 'prob'
)

xgb_tn_down <- GraphLearner$new(
  po_under %>>%
    po('learner', lrn("classif.xgboost",
                      predict_type = 'prob'), id = 'xgb_tn'),
  predict_type = 'prob'
)

learners_all <- list(
  xgb_td_raw,
  xgb_tn_raw,
  xgb_td_up,
  xgb_tn_up,
  xgb_td_down,
  xgb_tn_down
)
names(learners_all) <- sapply(learners_all, function(x) x$id)

# Create pipeline as a graph. This way, pipeline can be plotted. Pipeline can then be converted into a learner with GraphLearner$new(pipeline).
# Pipeline is a collection of Graph Learners (type ?GraphLearner in the command line for info).
# Each GraphLearner is a td or tn model (see abbreviations above) with or without class balancing.
# Up/down or no sampling happens within each GraphLearner, otherwise an error during tuning indicates that there are >= 2 data sources.
# Up/down or no sampling within each GraphLearner can be specified by chaining the relevant pipe operators (function po(); type ?PipeOp in command line) with the PipeOp of each learner.
graph <- 
  #po("imputehist") %>>% # Optional. Impute missing values only when using classifiers that can't handle them (e.g. Random Forest).
  po("branch", names(learners_all)) %>>%
  gunion(unname(learners_all)) %>>%
  po("unbranch")

graph$plot() # Plot pipeline

pipe <- GraphLearner$new(graph) # Convert pipeline to learner
pipe$predict_type <- 'prob' # Don't forget to specify we want to predict probabilities and not classes.

ps_table <- as.data.table(pipe$param_set)
View(ps_table[, 1:4])

# Set hyperparameter ranges for the tunable learners
ps_xgboost <- ps_table$id %>%
  lapply(
    function(x) {
      if (grepl('_tn', x)) {
        if (grepl('.booster', x)) {
          ParamFct$new(x, levels = "gbtree")
        } else if (grepl('.nrounds', x)) {
          ParamInt$new(x, lower = 100, upper = 110)
        } else if (grepl('.max_depth', x)) {
          ParamInt$new(x, lower = 3, upper = 10)
        } else if (grepl('.min_child_weight', x)) {
          ParamDbl$new(x, lower = 0, upper = 10)
        } else if (grepl('.subsample', x)) {
          ParamDbl$new(x, lower = 0, upper = 1)
        } else if (grepl('.eta', x)) {
          ParamDbl$new(x, lower = 0.1, upper = 0.6)
        } else if (grepl('.colsample_bytree', x)) {
          ParamDbl$new(x, lower = 0.5, upper = 1)
        } else if (grepl('.gamma', x)) {
          ParamDbl$new(x, lower = 0, upper = 5)
        }
      }
    }
  )
ps_xgboost <- Filter(Negate(is.null), ps_xgboost)
ps_xgboost <- ParamSet$new(ps_xgboost)

# Se parameter ranges for the class balancing strategies
ps_class_balancing <- ps_table$id %>%
  lapply(
    function(x) {
      if (all(grepl('up.', x), grepl('.ratio', x))) {
        ParamDbl$new(x, lower = 1, upper = upsample_ratio)
      } else if (all(grepl('down.', x), grepl('.ratio', x))) {
        ParamDbl$new(x, lower = downsample_ratio, upper = 1)
      }
    }
  )
ps_class_balancing <- Filter(Negate(is.null), ps_class_balancing)
ps_class_balancing <- ParamSet$new(ps_class_balancing)

# Define parameter set
param_set <- ParamSetCollection$new(list(
  ParamSet$new(list(pipe$param_set$params$branch.selection$clone())), # ParamFct can be copied.
  ps_xgboost, 
  ps_class_balancing
))

# Add dependencies. For instance, we can only set the mtry value if the pipe is configured to use the Random Forest (ranger).
# In a similar manner, we want do add a dependency between, e.g. hyperparameter "raw.xgb_td.xgb_tn.booster" and branch "raw.xgb_td"
# See https://mlr3gallery.mlr-org.com/tuning-over-multiple-learners/
param_set$ids()[-1] %>%
  lapply(
    function(x) {
      aux <- names(learners_all) %>%
        sapply(
          function(y) {
            grepl(y, x)
          }
        )
      aux <- names(aux[aux])
      param_set$add_dep(x, "branch.selection", 
                        CondEqual$new(aux))
    }
  )

# Set up tuning instance
instance <- TuningInstance$new(
  task = task_train,
  learner = pipe,
  resampling = rsmp('cv', folds = 2),
  measures = msr("classif.bbrier"),
  #measures = prc_micro,
  param_set,
  terminator = term("evals", n_evals = 3))
tuner <- TunerRandomSearch$new()

# Tune pipe learner to find best-performing branch
tuner$tune(instance)

instance$result
instance$archive() 
instance$archive(unnest = "tune_x") # Unnest the tuner search space values

pipe$param_set$values <- instance$result$params
pipe$train(task_train)

pred <- pipe$predict(task_test)
pred$confusion

Note that the tuner chooses to disregard the tuning of the tunable learners and focuses on the tuned learners only. This can be confirmed by inspecting instance$result: the only things that have been tuned for the tunable learners are the class-balancing parameters, which are actually not learner hyperparameters.

Example 2: Build a pipe that includes tunable learners only, find the 'best' one, and then benchmark it against the learners with fixed hyperparameters at a second stage.

Step 1: Build pipe for tunable learners

learners_all <- list(
  #xgb_td_raw,
  xgb_tn_raw,
  #xgb_td_up,
  xgb_tn_up,
  #xgb_td_down,
  xgb_tn_down
)
names(learners_all) <- sapply(learners_all, function(x) x$id)

# Create pipeline as a graph. This way, pipeline can be plotted. Pipeline can then be converted into a learner with GraphLearner$new(pipeline).
# Pipeline is a collection of Graph Learners (type ?GraphLearner in the command line for info).
# Each GraphLearner is a td or tn model (see abbreviations above) with or without class balancing.
# Up/down or no sampling happens within each GraphLearner, otherwise an error during tuning indicates that there are >= 2 data sources.
# Up/down or no sampling within each GraphLearner can be specified by chaining the relevant pipe operators (function po(); type ?PipeOp in command line) with the PipeOp of each learner.
graph <- 
  #po("imputehist") %>>% # Optional. Impute missing values only when using classifiers that can't handle them (e.g. Random Forest).
  po("branch", names(learners_all)) %>>%
  gunion(unname(learners_all)) %>>%
  po("unbranch")

graph$plot() # Plot pipeline

pipe <- GraphLearner$new(graph) # Convert pipeline to learner
pipe$predict_type <- 'prob' # Don't forget to specify we want to predict probabilities and not classes.

ps_table <- as.data.table(pipe$param_set)
View(ps_table[, 1:4])

ps_xgboost <- ps_table$id %>%
  lapply(
    function(x) {
      if (grepl('_tn', x)) {
        if (grepl('.booster', x)) {
          ParamFct$new(x, levels = "gbtree")
        } else if (grepl('.nrounds', x)) {
          ParamInt$new(x, lower = 100, upper = 110)
        } else if (grepl('.max_depth', x)) {
          ParamInt$new(x, lower = 3, upper = 10)
        } else if (grepl('.min_child_weight', x)) {
          ParamDbl$new(x, lower = 0, upper = 10)
        } else if (grepl('.subsample', x)) {
          ParamDbl$new(x, lower = 0, upper = 1)
        } else if (grepl('.eta', x)) {
          ParamDbl$new(x, lower = 0.1, upper = 0.6)
        } else if (grepl('.colsample_bytree', x)) {
          ParamDbl$new(x, lower = 0.5, upper = 1)
        } else if (grepl('.gamma', x)) {
          ParamDbl$new(x, lower = 0, upper = 5)
        }
      }
    }
  )
ps_xgboost <- Filter(Negate(is.null), ps_xgboost)
ps_xgboost <- ParamSet$new(ps_xgboost)

ps_class_balancing <- ps_table$id %>%
  lapply(
    function(x) {
      if (all(grepl('up.', x), grepl('.ratio', x))) {
        ParamDbl$new(x, lower = 1, upper = upsample_ratio)
      } else if (all(grepl('down.', x), grepl('.ratio', x))) {
        ParamDbl$new(x, lower = downsample_ratio, upper = 1)
      }
    }
  )
ps_class_balancing <- Filter(Negate(is.null), ps_class_balancing)
ps_class_balancing <- ParamSet$new(ps_class_balancing)

param_set <- ParamSetCollection$new(list(
  ParamSet$new(list(pipe$param_set$params$branch.selection$clone())), # ParamFct can be copied.
  ps_xgboost, 
  ps_class_balancing
))

# Add dependencies. For instance, we can only set the mtry value if the pipe is configured to use the Random Forest (ranger).
# In a similar manner, we want do add a dependency between, e.g. hyperparameter "raw.xgb_td.xgb_tn.booster" and branch "raw.xgb_td"
# See https://mlr3gallery.mlr-org.com/tuning-over-multiple-learners/
param_set$ids()[-1] %>%
  lapply(
    function(x) {
      aux <- names(learners_all) %>%
        sapply(
          function(y) {
            grepl(y, x)
          }
        )
      aux <- names(aux[aux])
      param_set$add_dep(x, "branch.selection", 
                        CondEqual$new(aux))
    }
  )

# Set up tuning instance
instance <- TuningInstance$new(
  task = task_train,
  learner = pipe,
  resampling = rsmp('cv', folds = 2),
  measures = msr("classif.bbrier"),
  #measures = prc_micro,
  param_set,
  terminator = term("evals", n_evals = 3))
tuner <- TunerRandomSearch$new()

# Tune pipe learner to find best-performing branch
tuner$tune(instance)

instance$result
instance$archive() 
instance$archive(unnest = "tune_x") # Unnest the tuner search space values

pipe$param_set$values <- instance$result$params
pipe$train(task_train)

pred <- pipe$predict(task_test)
pred$confusion

Note that now instance$result returns optimal results for the learners' hyperparameters too, and not just for the class-balancing parameters.

Step 2: Benchmark 'best' tunable learner (now tuned) and the learners that have fixed hyperparameters

# Define re-sampling and instantiate it so always the same split will be used

resampling <- rsmp("cv", folds = 2)

set.seed(123)
resampling$instantiate(task_train)

bmr <- benchmark(
  design = benchmark_grid(
    task_train,
    learner = list(pipe, xgb_td_raw, xgb_td_up, xgb_tn_down),
    resampling
  ),
  store_models = TRUE # Only needed if you want to inspect the models
)

bmr$aggregate(msr("classif.bbrier"))

A few issues to consider

I should have probably created a second, separate pipe for the learners that have fixed hyperparameters, in order to at least have the class-balancing parameters tuned. Then, the two pipes (tunable and fixed hyperparameters) would be benchmarked with benchmark().
I should have probably used the same resampling strategy from beginning to end? I.e., instantiate the reampling strategy right before tuning the first pipe, so that this strategy is also used in the second pipe and in the final benchmark.

Comments/validation more than welcome.

(special thanks to missuse for the constructive comments)

回答2:

The simplest way to benchmark several pipelines is to define the appropriate graphs and use the benchmark function:

library(paradox)
library(mlr3)
library(mlr3pipelines)
library(mlr3tuning)

learner <- lrn("classif.rpart", predict_type = "prob")
learner$param_set$values <- list(
  cp = 0,
  maxdepth = 21,
  minbucket = 12,
  minsplit = 24
)

Create the tree graphs:

graph 1, just imputehist

graph_nop <- po("imputehist") %>>%
  learner

graph 2 : imputehist and undersample majority class (ratio relative to majority class)

graph_down <- po("imputehist") %>>%
  po("classbalancing", id = "undersample", adjust = "major", 
     reference = "major", shuffle = FALSE, ratio = 1/2) %>>%
  learner

graph 3: impute hist and oversample minority class (ratio relative to minority class)

graph_up <- po("imputehist") %>>%
  po("classbalancing", id = "oversample", adjust = "minor", 
     reference = "minor", shuffle = FALSE, ratio = 2) %>>%
  learner

Convert graphs to learners and set predict_type

graph_nop <-  GraphLearner$new(graph_nop)
graph_nop$predict_type <- "prob"

graph_down <- GraphLearner$new(graph_down)
graph_down$predict_type <- "prob"

graph_up <- GraphLearner$new(graph_up)
graph_up$predict_type <- "prob"

define re-sampling and instantiate it so always the same split will be used:

hld <- rsmp("holdout")

set.seed(123)
hld$instantiate(tsk("sonar"))

Benchmark

bmr <- benchmark(design = benchmark_grid(task = tsk("sonar"),
                                        learner = list(graph_nop,
                                                       graph_up,
                                                       graph_down),
                                        hld),
                store_models = TRUE) #only needed if you want to inspect the models

check result using different measures:

bmr$aggregate(msr("classif.auc"))

   nr  resample_result task_id                           learner_id resampling_id iters classif.auc
1:  1 <ResampleResult>   sonar             imputehist.classif.rpart       holdout     1   0.7694257
2:  2 <ResampleResult>   sonar  imputehist.oversample.classif.rpart       holdout     1   0.7360642
3:  3 <ResampleResult>   sonar imputehist.undersample.classif.rpart       holdout     1   0.7668919

bmr$aggregate(msr("classif.ce"))

   nr  resample_result task_id                           learner_id resampling_id iters classif.ce
1:  1 <ResampleResult>   sonar             imputehist.classif.rpart       holdout     1  0.3043478
2:  2 <ResampleResult>   sonar  imputehist.oversample.classif.rpart       holdout     1  0.3188406
3:  3 <ResampleResult>   sonar imputehist.undersample.classif.rpart       holdout     1  0.2898551

This can be also performed within one pipeline with branching but one would need to define the paramset and use a tuner:

graph2 <- 
  po("imputehist") %>>%
  po("branch", c("nop", "classbalancing_up", "classbalancing_down")) %>>%
  gunion(list(
    po("nop", id = "nop"),
    po("classbalancing", id = "classbalancing_up", ratio = 2, reference = 'major'),
    po("classbalancing", id = "classbalancing_down", ratio = 2, reference = 'minor') 
  )) %>>%
  po("unbranch") %>>%
  learner

graph2$plot()

Note that the unbranch happens before the learner since one (always the same) learner is being used. Convert graph to learner and set predict_type

graph2 <- GraphLearner$new(graph2)
graph2$predict_type <- "prob"

Define the param set. In this case just the different branch options.

ps <- ParamSet$new(
  list(
    ParamFct$new("branch.selection", levels = c("nop", "classbalancing_up", "classbalancing_down"))
  ))

In general you would want to add also learner hyper parameters like cp and minsplit for rpart as well as the ratio of over/undersampling.

Create a tuning instance and grid search with resolution 1 since no other parameters are tuned. The tuner will iterate through different pipeline branches as defined in the paramset.

instance <- TuningInstance$new(
  task = tsk("sonar"),
  learner = graph2,
  resampling = hld,
  measures = msr("classif.auc"),
  param_set = ps,
  terminator = term("none")
)


tuner <- tnr("grid_search", resolution = 1)
set.seed(321)
tuner$tune(instance)

Check the result:

instance$archive(unnest = "tune_x")

   nr batch_nr  resample_result task_id
1:  1        1 <ResampleResult>   sonar
2:  2        2 <ResampleResult>   sonar
3:  3        3 <ResampleResult>   sonar
                                                                            learner_id resampling_id iters params
1: imputehist.branch.null.classbalancing_up.classbalancing_down.unbranch.classif.rpart       holdout     1 <list>
2: imputehist.branch.null.classbalancing_up.classbalancing_down.unbranch.classif.rpart       holdout     1 <list>
3: imputehist.branch.null.classbalancing_up.classbalancing_down.unbranch.classif.rpart       holdout     1 <list>
   warnings errors classif.auc    branch.selection
1:        0      0   0.7842061 classbalancing_down
2:        0      0   0.7673142   classbalancing_up
3:        0      0   0.7694257                 nop

Even though the above example is possible, I think mlr3pipelines is designed so you tune learner hyper parameters jointly with preprocessing steps while also selecting best preprocessing steps (via branching).

Question 3 has multiple sub questions some of which would take quite a lot of code and explaining to answer. I suggest checking the mlr3book as well as the mlr3gallery.

EDIT: a mlr3 gallery post: https://mlr3gallery.mlr-org.com/posts/2020-03-30-imbalanced-data/ is relevant for the question.

来源：https://stackoverflow.com/questions/61014457/mlr3-pipeops-create-branches-with-different-data-transformations-and-benchmark

标签

mlr3