I have trouble generating the following dummy-variables in R:
I\'m analyzing yearly time series data (time period 1948-2009). I have two questions:
The other answers here offer direct routes to accomplish this task—one that many models (e.g. lm
) will do for you internally anyway. Nonetheless, here are ways to make dummy variables with Max Kuhn's popular caret and recipes packages. While somewhat more verbose, they both scale easily to more complicated situations, and fit neatly into their respective frameworks.
With caret
, the relevant function is dummyVars
, which has a predict
method to apply it on a data frame:
df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),
y = 1:6)
library(caret)
dummy <- dummyVars(~ ., data = df, fullRank = TRUE)
dummy
#> Dummy Variable Object
#>
#> Formula: ~.
#> 2 variables, 1 factors
#> Variables and levels will be separated by '.'
#> A full rank encoding is used
predict(dummy, df)
#> letter.b letter.c y
#> 1 0 0 1
#> 2 0 0 2
#> 3 1 0 3
#> 4 1 0 4
#> 5 0 1 5
#> 6 0 1 6
With recipes
, the relevant function is step_dummy
:
library(recipes)
dummy_recipe <- recipe(y ~ letter, df) %>%
step_dummy(letter)
dummy_recipe
#> Data Recipe
#>
#> Inputs:
#>
#> role #variables
#> outcome 1
#> predictor 1
#>
#> Steps:
#>
#> Dummy variables from letter
Depending on context, extract the data with prep
and either bake
or juice
:
# Prep and bake on new data...
dummy_recipe %>%
prep() %>%
bake(df)
#> # A tibble: 6 x 3
#> y letter_b letter_c
#>
#> 1 1 0 0
#> 2 2 0 0
#> 3 3 1 0
#> 4 4 1 0
#> 5 5 0 1
#> 6 6 0 1
# ...or use `retain = TRUE` and `juice` to extract training data
dummy_recipe %>%
prep(retain = TRUE) %>%
juice()
#> # A tibble: 6 x 3
#> y letter_b letter_c
#>
#> 1 1 0 0
#> 2 2 0 0
#> 3 3 1 0
#> 4 4 1 0
#> 5 5 0 1
#> 6 6 0 1