Create new dummy variable columns from categorical variable

后端未结

关注

 8  1037

I have a several data sets with 75,000 observations and a type variable that can take on a value 0-4. I want to add five new dummy variables to each data set f

相关标签:

8条回答

佛祖请我去吃肉

2020-11-28 05:05

What about using model.matrix()?

> binom <- data.frame(data=runif(1e5),type=sample(0:4,1e5,TRUE))
> head(binom)
       data type
1 0.1412164    2
2 0.8764588    2
3 0.5559061    4
4 0.3890109    3
5 0.8725753    3
6 0.8358100    1
> inds <- model.matrix(~ factor(binom$type) - 1)
> head(inds)
  factor(binom$type)0 factor(binom$type)1 factor(binom$type)2 factor(binom$type)3 factor(binom$type)4
1                   0                   0                   1                   0                   0
2                   0                   0                   1                   0                   0
3                   0                   0                   0                   0                   1
4                   0                   0                   0                   1                   0
5                   0                   0                   0                   1                   0
6                   0                   1                   0                   0                   0

0 讨论(0)

春和景丽

2020-11-28 05:05

The recipes package can also be quite powerful to do this. The example below is quite verbose but it can be really clean as soon as you add more preprocessing steps.

library(recipes)

binom <- data.frame(y = runif(1e5), 
                    x = runif(1e5),
                    catVar = as.factor(sample(0:4, 1e5, TRUE))) # use the example from gappy
head(binom)

new_data <- recipe(y ~ ., data = binom) %>% 
  step_dummy(catVar) %>% # add dummy variable
  prep(training = binom) %>% # apply the preprocessing steps (could be more than just adding dummy variables)
  bake(newdata = binom) # apply the recipe to new data
head(new_data)

Other step examples are step_scale, step_center, step_pca, etc.

0 讨论(0)

上一页 1 2