问题
I want to create 7 dummy variables -one for each day, using dplyr
So far, I have managed to do it using the sjmisc
package and the to_dummy
function, but I do it in 2 steps -1.Create a df of dummies, 2) append to the original df
#Sample dataframe
mydfdata.frame(x=rep(letters[1:9]),
day=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Fri","Mon"))
#1.Create the 7 dummy variables separately
daysdummy<-sjmisc::to_dummy(mydf$day,suffix="label")
#2. append to dataframe
mydf<-bind_cols(mydf,daysdummy)
> mydf
x day day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed
1 a Mon 0 1 0 0 0 0 0
2 b Tues 0 0 0 0 0 1 0
3 c Wed 0 0 0 0 0 0 1
4 d Thurs 0 0 0 0 1 0 0
5 e Fri 1 0 0 0 0 0 0
6 f Sat 0 0 1 0 0 0 0
7 g Sun 0 0 0 1 0 0 0
8 h Fri 1 0 0 0 0 0 0
9 i Mon 0 1 0 0 0 0 0
My question is whether I can do it in one single workflow using dplyr
and add the to_dummy
into the pipe-workflow- perhaps using mutate
?
*to_dummy
documentation
回答1:
If you want to do this with the pipe, you can do something like:
library(dplyr)
library(sjmisc)
mydf %>%
to_dummy(day, suffix = "label") %>%
bind_cols(mydf) %>%
select(x, day, everything())
Returns:
# A tibble: 9 x 9 x day day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 a Mon 0. 1. 0. 0. 0. 0. 0. 2 b Tues 0. 0. 0. 0. 0. 1. 0. 3 c Wed 0. 0. 0. 0. 0. 0. 1. 4 d Thurs 0. 0. 0. 0. 1. 0. 0. 5 e Fri 1. 0. 0. 0. 0. 0. 0. 6 f Sat 0. 0. 1. 0. 0. 0. 0. 7 g Sun 0. 0. 0. 1. 0. 0. 0. 8 h Fri 1. 0. 0. 0. 0. 0. 0. 9 i Mon 0. 1. 0. 0. 0. 0. 0.
With dplyr
and tidyr
we could do:
library(dplyr)
library(tidyr)
mydf %>%
mutate(var = 1) %>%
spread(day, var, fill = 0, sep = "_") %>%
left_join(mydf) %>%
select(x, day, everything())
And with base R we could do something like:
as.data.frame.matrix(table(rep(mydf$x, lengths(mydf$day)), unlist(mydf$day)))
Returns:
Fri Mon Sat Sun Thurs Tues Wed a 0 1 0 0 0 0 0 b 0 0 0 0 0 1 0 c 0 0 0 0 0 0 1 d 0 0 0 0 1 0 0 e 1 0 0 0 0 0 0 f 0 0 1 0 0 0 0 g 0 0 0 1 0 0 0 h 1 0 0 0 0 0 0 i 0 1 0 0 0 0 0
回答2:
Instead of sjmisc::to_dummy
you can also use base R's model.matrix
; a dplyr
solution would be:
library(dplyr);
model.matrix(~ 0 + day, mydf) %>%
as.data.frame() %>%
bind_cols(mydf) %>%
select(x, day, everything());
# x day dayFri dayMon daySat daySun dayThurs dayTues dayWed
#1 a Mon 0 1 0 0 0 0 0
#2 b Tues 0 0 0 0 0 1 0
#3 c Wed 0 0 0 0 0 0 1
#4 d Thurs 0 0 0 0 1 0 0
#5 e Fri 1 0 0 0 0 0 0
#6 f Sat 0 0 1 0 0 0 0
#7 g Sun 0 0 0 1 0 0 0
#8 h Fri 1 0 0 0 0 0 0
#9 i Mon 0 1 0 0 0 0 0
回答3:
An alternative solution using dummies()
which I think would be quicker would be
mydf = data.frame(x=rep(letters[1:9]),
day=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Fri","Mon"))
library(dummies)
mydf <- cbind(mydf, dummy(mydf$day, sep = "_"))
That yields
x day mydf_Fri mydf_Mon mydf_Sat mydf_Sun mydf_Thurs mydf_Tues mydf_Wed
1 a Mon 0 1 0 0 0 0 0
2 b Tues 0 0 0 0 0 1 0
3 c Wed 0 0 0 0 0 0 1
4 d Thurs 0 0 0 0 1 0 0
5 e Fri 1 0 0 0 0 0 0
6 f Sat 0 0 1 0 0 0 0
7 g Sun 0 0 0 1 0 0 0
8 h Fri 1 0 0 0 0 0 0
9 i Mon 0 1 0 0 0 0 0
Then you can use gsub()
to have cleaner names
names(mydf) = gsub("mydf_", "", names(mydf))
head(mydf)
x day Fri Mon Sat Sun Thurs Tues Wed
1 a Mon 0 1 0 0 0 0 0
2 b Tues 0 0 0 0 0 1 0
3 c Wed 0 0 0 0 0 0 1
4 d Thurs 0 0 0 0 1 0 0
5 e Fri 1 0 0 0 0 0 0
6 f Sat 0 0 1 0 0 0 0
来源:https://stackoverflow.com/questions/49276914/mutating-dummy-variables-in-dplyr