Mutating dummy variables in dplyr

前端 未结 3 1447
名媛妹妹
名媛妹妹 2021-01-02 23:24

I want to create 7 dummy variables -one for each day, using dplyr

So far, I have managed to do it using the sjmisc package and the to_dummy

相关标签:
3条回答
  • 2021-01-02 23:39

    An alternative solution using dummies() which I think would be quicker would be

    mydf = data.frame(x=rep(letters[1:9]),
                   day=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Fri","Mon"))
    
    
    library(dummies)
    
    mydf <- cbind(mydf, dummy(mydf$day, sep = "_"))
    

    That yields

    x   day mydf_Fri mydf_Mon mydf_Sat mydf_Sun mydf_Thurs mydf_Tues mydf_Wed
    1 a   Mon        0        1        0        0          0         0        0
    2 b  Tues        0        0        0        0          0         1        0
    3 c   Wed        0        0        0        0          0         0        1
    4 d Thurs        0        0        0        0          1         0        0
    5 e   Fri        1        0        0        0          0         0        0
    6 f   Sat        0        0        1        0          0         0        0
    7 g   Sun        0        0        0        1          0         0        0
    8 h   Fri        1        0        0        0          0         0        0
    9 i   Mon        0        1        0        0          0         0        0
    

    Then you can use gsub() to have cleaner names

    names(mydf) = gsub("mydf_", "", names(mydf))
    head(mydf)
      x   day Fri Mon Sat Sun Thurs Tues Wed
    1 a   Mon   0   1   0   0     0    0   0
    2 b  Tues   0   0   0   0     0    1   0
    3 c   Wed   0   0   0   0     0    0   1
    4 d Thurs   0   0   0   0     1    0   0
    5 e   Fri   1   0   0   0     0    0   0
    6 f   Sat   0   0   1   0     0    0   0
    
    0 讨论(0)
  • 2021-01-02 23:40

    If you want to do this with the pipe, you can do something like:

    library(dplyr)
    library(sjmisc)
    
    mydf %>% 
      to_dummy(day, suffix = "label") %>% 
      bind_cols(mydf) %>% 
      select(x, day, everything())
    

    Returns:

    # A tibble: 9 x 9
      x     day   day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed
      <fct> <fct>   <dbl>   <dbl>   <dbl>   <dbl>     <dbl>    <dbl>   <dbl>
    1 a     Mon        0.      1.      0.      0.        0.       0.      0.
    2 b     Tues       0.      0.      0.      0.        0.       1.      0.
    3 c     Wed        0.      0.      0.      0.        0.       0.      1.
    4 d     Thurs      0.      0.      0.      0.        1.       0.      0.
    5 e     Fri        1.      0.      0.      0.        0.       0.      0.
    6 f     Sat        0.      0.      1.      0.        0.       0.      0.
    7 g     Sun        0.      0.      0.      1.        0.       0.      0.
    8 h     Fri        1.      0.      0.      0.        0.       0.      0.
    9 i     Mon        0.      1.      0.      0.        0.       0.      0.
    

    With dplyr and tidyr we could do:

    library(dplyr)
    library(tidyr)
    
    mydf %>% 
      mutate(var = 1) %>% 
      spread(day, var, fill = 0, sep = "_") %>% 
      left_join(mydf) %>% 
      select(x, day, everything())
    

    And with base R we could do something like:

    as.data.frame.matrix(table(rep(mydf$x, lengths(mydf$day)), unlist(mydf$day)))
    

    Returns:

      Fri Mon Sat Sun Thurs Tues Wed
    a   0   1   0   0     0    0   0
    b   0   0   0   0     0    1   0
    c   0   0   0   0     0    0   1
    d   0   0   0   0     1    0   0
    e   1   0   0   0     0    0   0
    f   0   0   1   0     0    0   0
    g   0   0   0   1     0    0   0
    h   1   0   0   0     0    0   0
    i   0   1   0   0     0    0   0
    
    0 讨论(0)
  • 2021-01-02 23:40

    Instead of sjmisc::to_dummy you can also use base R's model.matrix; a dplyr solution would be:

    library(dplyr);
    model.matrix(~ 0 + day, mydf) %>%
        as.data.frame() %>%
        bind_cols(mydf) %>%
        select(x, day, everything());
    #  x   day dayFri dayMon daySat daySun dayThurs dayTues dayWed
    #1 a   Mon      0      1      0      0        0       0      0
    #2 b  Tues      0      0      0      0        0       1      0
    #3 c   Wed      0      0      0      0        0       0      1
    #4 d Thurs      0      0      0      0        1       0      0
    #5 e   Fri      1      0      0      0        0       0      0
    #6 f   Sat      0      0      1      0        0       0      0
    #7 g   Sun      0      0      0      1        0       0      0
    #8 h   Fri      1      0      0      0        0       0      0
    #9 i   Mon      0      1      0      0        0       0      0
    
    0 讨论(0)
提交回复
热议问题