Some others have mentioned using model.matrix
to get the design matrix. This is a good solution. But I find that I usually want to customize how the missing values are treated or how I can collapse rare levels. So, here is a an alternative function that you can customize.
```
one_hot_encode <- function(DT, cols_to_encode, include_last = TRUE
, protected_NA_val = 'NA_MISSING'
) {
for (col in cols_to_encode) {
level_freq <- DT[, sort(table(get(col), useNA = 'ifany')
, decreasing = TRUE)]
level_names <- names(level_freq)
level_names[is.na(level_names)] <- protected_NA_val
if (!include_last) {
level_names <- level_names[-length(level_names)]
}
for (lev in level_names) {
new_col_name <- paste('ONE_HOT', col, lev, sep = '_')
DT[, (new_col_name) := 0]
if (lev == protected_NA_val) {
DT[is.na(get(col)), (new_col_name) := 1]
} else {
DT[get(col) == lev, (new_col_name) := 1]
}
}
}
return(DT)
}
```
So that, applying this function to your dataset becomes:
```
DT <- data.table(
time = c(20000616, 20000616, 20000616, 20000616, 20000616, 20000616)
, hour = c(1, 2, 3, 4, 5, 6)
, money = c(9.35, 6.22, 10.65, 11.42, 10.12, 7.32)
, day = c(5, 5, 5, 5, 5, 5)
)
DT <- one_hot_encode(DT, 'hour')
```