Create new dummy variable columns from categorical variable

后端 未结 8 1035
礼貌的吻别 2020-11-28 04:03

I have a several data sets with 75,000 observations and a type variable that can take on a value 0-4. I want to add five new dummy variables to each data set f

  • 2020-11-28 05:05

    What about using model.matrix()?

    > binom <- data.frame(data=runif(1e5),type=sample(0:4,1e5,TRUE))
    > head(binom)
           data type
    1 0.1412164    2
    2 0.8764588    2
    3 0.5559061    4
    4 0.3890109    3
    5 0.8725753    3
    6 0.8358100    1
    > inds <- model.matrix(~ factor(binom$type) - 1)
    > head(inds)
      factor(binom$type)0 factor(binom$type)1 factor(binom$type)2 factor(binom$type)3 factor(binom$type)4
    1                   0                   0                   1                   0                   0
    2                   0                   0                   1                   0                   0
    3                   0                   0                   0                   0                   1
    4                   0                   0                   0                   1                   0
    5                   0                   0                   0                   1                   0
    6                   0                   1                   0                   0                   0
    0 讨论(0)
  • 2020-11-28 05:05

    The recipes package can also be quite powerful to do this. The example below is quite verbose but it can be really clean as soon as you add more preprocessing steps.

    binom <- data.frame(y = runif(1e5), 
                        x = runif(1e5),
                        catVar = as.factor(sample(0:4, 1e5, TRUE))) # use the example from gappy
    new_data <- recipe(y ~ ., data = binom) %>% 
      step_dummy(catVar) %>% # add dummy variable
      prep(training = binom) %>% # apply the preprocessing steps (could be more than just adding dummy variables)
      bake(newdata = binom) # apply the recipe to new data

    Other step examples are step_scale, step_center, step_pca, etc.

    0 讨论(0)