How to use expand.grid values to run various model hyperparameter combinations for ranger in R

I've seen various posts on how to select the independent variables for a model by using expand.grid and then create a formula based on that selection. However, I prepare my input tables beforehand and store them in a list.

Input_list <- list(iris1 = iris, iris2 = iris)  # let's assume these are different input tables

I'm rather interested in trying all the possible hyperparameter combinations for a given algorithm (here: Random Forest using ranger) for my list of input tables. I do the following to set up the grid:

hyper_grid <- expand.grid(
  Input_table = names(Input_list),
  Trees = c(10, 20),
  Importance = c("none", "impurity"),
  Classification = TRUE,
  Repeats = 1:5,
  Target = "Species")

> head(hyper_grid)
  Input_table Trees Importance Classification Repeats  Target
1       iris1    10       none           TRUE       1 Species
2       iris2    10       none           TRUE       1 Species
3       iris1    20       none           TRUE       1 Species
4       iris2    20       none           TRUE       1 Species
5       iris1    10   impurity           TRUE       1 Species
6       iris2    10   impurity           TRUE       1 Species

My question is, what is the best way to pass this values to the model? Currently I'm using a for loop:

for (i in 1:nrow(hyper_grid)) {
  RF_train <- ranger( = hyper_grid[i, "Target"], 
    data = Input_list[[hyper_grid[i, "Input_table"]]],  # referring to the named object in the list
    num.trees = hyper_grid[i, "Trees"], 
    importance = hyper_grid[i, "Importance"], 
    classification = hyper_grid[i, "Classification"])  # otherwise regression is performed

iterating over each row of the grid. But for one, I have to tell the model now whether it is classification or regression. I assume the factor Species is converted to numeric factor levels, so regression occurs by default. Is there a way to prevent this and also use e.g. apply for this role? This way of iterating also results in messy function calls:

 ranger( = hyper_grid[i, "Target"], data = Input_list[[hyper_grid[i,      "Input_table"]]], num.trees = hyper_grid[i, "Trees"], importance = hyper_grid[i,      "Importance"], classification = hyper_grid[i, "Classification"])

Second: in reality, the output of the model is then obviously not printed, but I immediately capture the important results (mainly the RF_train$confusion.matrix) and write the results into an extended version of the hyper_grid on the same row with the input parameters. Is this performance wise to costly? Because if I store the ranger-objects, I'm running into memory issues at some point.

I think it is cleanest to wrap the training and extraction of the values you need into a function. The dots (...) are needed for usage with the purrr::pmap function below.

fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
  RF_train <- ranger( = Target, 
    data = Input_list[[Input_table]],  # referring to the named object in the list
    num.trees = Trees, 
    importance = Importance, 
    classification = Classification)  # otherwise regression is performed

  data.frame(Prediction_error = RF_train$prediction.error,
             True_positive = RF_train$confusion.matrix[1])

Then you can add the results as a column by mapping over the rows using for example purrr::pmap:

hyper_grid$res <- purrr::pmap(hyper_grid, fit_and_extract_metrics)

By mapping in this way, the function is applied row by row, so you should not run into memory issues.

The result of purrr::pmap is a list, which means that the column res contains a list for every row. This can be unnested using tidyr::unnest to spread the elements of that list across your data frame.

tidyr::unnest(hyper_grid, res)

I think this approach is very elegant, but it requires some tidyverse knowledge. I highly recommend this book if you want to know more about that. Chapter 25 (Many models) describes an approach similar to the one I'm taking here.

