Generate a dummy-variable

前端未结

关注

 17  1142

遇见更好的自我 2020-11-21 11:41

I have trouble generating the following dummy-variables in R:

I\'m analyzing yearly time series data (time period 1948-2009). I have two questions:

17条回答

抹茶落季 (楼主)

2020-11-21 12:17

The other answers here offer direct routes to accomplish this task—one that many models (e.g. lm) will do for you internally anyway. Nonetheless, here are ways to make dummy variables with Max Kuhn's popular caret and recipes packages. While somewhat more verbose, they both scale easily to more complicated situations, and fit neatly into their respective frameworks.

caret::dummyVars

With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:

df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),
                 y = 1:6)

library(caret)

dummy <- dummyVars(~ ., data = df, fullRank = TRUE)

dummy
#> Dummy Variable Object
#> 
#> Formula: ~.
#> 2 variables, 1 factors
#> Variables and levels will be separated by '.'
#> A full rank encoding is used

predict(dummy, df)
#>   letter.b letter.c y
#> 1        0        0 1
#> 2        0        0 2
#> 3        1        0 3
#> 4        1        0 4
#> 5        0        1 5
#> 6        0        1 6

recipes::step_dummy

With recipes, the relevant function is step_dummy:

library(recipes)

dummy_recipe <- recipe(y ~ letter, df) %>% 
    step_dummy(letter)

dummy_recipe
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          1
#> 
#> Steps:
#> 
#> Dummy variables from letter

Depending on context, extract the data with prep and either bake or juice:

# Prep and bake on new data...
dummy_recipe %>% 
    prep() %>% 
    bake(df)
#> # A tibble: 6 x 3
#>       y letter_b letter_c
#>           
#> 1     1        0        0
#> 2     2        0        0
#> 3     3        1        0
#> 4     4        1        0
#> 5     5        0        1
#> 6     6        0        1

# ...or use `retain = TRUE` and `juice` to extract training data
dummy_recipe %>% 
    prep(retain = TRUE) %>% 
    juice()
#> # A tibble: 6 x 3
#>       y letter_b letter_c
#>           
#> 1     1        0        0
#> 2     2        0        0
#> 3     3        1        0
#> 4     4        1        0
#> 5     5        0        1
#> 6     6        0        1

0 讨论(0)

查看其它17个回答