I have a several data sets with 75,000 observations and a type
variable that can take on a value 0-4. I want to add five new dummy variables to each data set f
What about using model.matrix()?
> binom <- data.frame(data=runif(1e5),type=sample(0:4,1e5,TRUE))
> head(binom)
data type
1 0.1412164 2
2 0.8764588 2
3 0.5559061 4
4 0.3890109 3
5 0.8725753 3
6 0.8358100 1
> inds <- model.matrix(~ factor(binom$type) - 1)
> head(inds)
factor(binom$type)0 factor(binom$type)1 factor(binom$type)2 factor(binom$type)3 factor(binom$type)4
1 0 0 1 0 0
2 0 0 1 0 0
3 0 0 0 0 1
4 0 0 0 1 0
5 0 0 0 1 0
6 0 1 0 0 0
The recipes package can also be quite powerful to do this. The example below is quite verbose but it can be really clean as soon as you add more preprocessing steps.
library(recipes)
binom <- data.frame(y = runif(1e5),
x = runif(1e5),
catVar = as.factor(sample(0:4, 1e5, TRUE))) # use the example from gappy
head(binom)
new_data <- recipe(y ~ ., data = binom) %>%
step_dummy(catVar) %>% # add dummy variable
prep(training = binom) %>% # apply the preprocessing steps (could be more than just adding dummy variables)
bake(newdata = binom) # apply the recipe to new data
head(new_data)
Other step examples are step_scale, step_center, step_pca, etc.