Generate a dummy-variable

前端 未结 17 1063
遇见更好的自我
遇见更好的自我 2020-11-21 11:41

I have trouble generating the following dummy-variables in R:

I\'m analyzing yearly time series data (time period 1948-2009). I have two questions:

  1. <
17条回答
  •  抹茶落季
    2020-11-21 12:17

    The other answers here offer direct routes to accomplish this task—one that many models (e.g. lm) will do for you internally anyway. Nonetheless, here are ways to make dummy variables with Max Kuhn's popular caret and recipes packages. While somewhat more verbose, they both scale easily to more complicated situations, and fit neatly into their respective frameworks.


    caret::dummyVars

    With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:

    df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),
                     y = 1:6)
    
    library(caret)
    
    dummy <- dummyVars(~ ., data = df, fullRank = TRUE)
    
    dummy
    #> Dummy Variable Object
    #> 
    #> Formula: ~.
    #> 2 variables, 1 factors
    #> Variables and levels will be separated by '.'
    #> A full rank encoding is used
    
    predict(dummy, df)
    #>   letter.b letter.c y
    #> 1        0        0 1
    #> 2        0        0 2
    #> 3        1        0 3
    #> 4        1        0 4
    #> 5        0        1 5
    #> 6        0        1 6
    

    recipes::step_dummy

    With recipes, the relevant function is step_dummy:

    library(recipes)
    
    dummy_recipe <- recipe(y ~ letter, df) %>% 
        step_dummy(letter)
    
    dummy_recipe
    #> Data Recipe
    #> 
    #> Inputs:
    #> 
    #>       role #variables
    #>    outcome          1
    #>  predictor          1
    #> 
    #> Steps:
    #> 
    #> Dummy variables from letter
    

    Depending on context, extract the data with prep and either bake or juice:

    # Prep and bake on new data...
    dummy_recipe %>% 
        prep() %>% 
        bake(df)
    #> # A tibble: 6 x 3
    #>       y letter_b letter_c
    #>           
    #> 1     1        0        0
    #> 2     2        0        0
    #> 3     3        1        0
    #> 4     4        1        0
    #> 5     5        0        1
    #> 6     6        0        1
    
    # ...or use `retain = TRUE` and `juice` to extract training data
    dummy_recipe %>% 
        prep(retain = TRUE) %>% 
        juice()
    #> # A tibble: 6 x 3
    #>       y letter_b letter_c
    #>           
    #> 1     1        0        0
    #> 2     2        0        0
    #> 3     3        1        0
    #> 4     4        1        0
    #> 5     5        0        1
    #> 6     6        0        1
    

提交回复
热议问题