Reconstruct a categorical variable from dummies in R

前端 未结 3 489
予麋鹿
予麋鹿 2021-01-15 16:40

Heyho, I am a beginner in R and have a problem to which I couldn\'t find a solution so far. I would like to transform dummy variables back to categorical variables.

相关标签:
3条回答
  • 2021-01-15 17:13

    You can do this with data.table

    id_cols = c("x1", "x2") 
    data.table::melt.data.table(data = dt, id.vars = id_cols, 
                                na.rm = TRUE, 
                                measure = patterns("dummy"))
    

    Example:

    t = data.table(dummy_a = c(1, 0, 0), dummy_b = c(0, 1, 0), dummy_c = c(0, 0, 1), id = c(1, 2, 3))
    data.table::melt.data.table(data = t, 
                                id.vars = "id", 
                                measure = patterns("dummy_"), 
                                na.rm = T)[value == 1, .(id, variable)]
    

    Output

       id variable
    1:  1  dummy_a
    2:  2  dummy_b
    3:  3  dummy_c
    

    It's even easier if you remplaze 0 by NA, so na.rm = TRUE in melt will drop every row with NA

    0 讨论(0)
  • 2021-01-15 17:15

    We can use max.col

    data.frame(dummy = names(df1)[max.col(df1)])
    #    dummy
    #1 dummy2
    #2 dummy1
    #3 dummy2
    #4 dummy3
    

    data

    df1 <- structure(list(dummy1 = c(0L, 1L, 0L, 0L), dummy2 = c(1L, 0L, 
     1L, 0L), dummy3 = c(0L, 0L, 0L, 1L)), .Names = c("dummy1", "dummy2", 
     "dummy3"), class = "data.frame", row.names = c(NA, -4L))
    
    0 讨论(0)
  • 2021-01-15 17:26

    Here is a tidyverse solution, using tidyr::gather. Here we treat the key as the variable that each dummy is a category of, and value as the presence/absence. Replacing 0 with NA combined with na.rm = TRUE in gather means we don't keep all the rest of the rows we don't want and don't create an unnecessarily large intermediate dataset.

    df1 <- structure(list(dummy1 = c(0L, 1L, 0L, 0L), dummy2 = c(1L, 0L, 
                                                                 1L, 0L), dummy3 = c(0L, 0L, 0L, 1L), ed1 = c(1, 0, 1, 0), ed2 = c(0, 
                                                                                                                                   1, 0, 1), id = c(1, 2, 3, 4)), .Names = c("dummy1", "dummy2", 
                                                                                                                                                                             "dummy3", "ed1", "ed2", "id"), row.names = c(NA, -4L), class = "data.frame")
    library(tidyverse)
    df1 %>%
      mutate_at(vars(dummy1:dummy3, ed1:ed2), ~ ifelse(. == 0, NA, .)) %>%
      gather("dummy", "present", dummy1:dummy3, na.rm = TRUE) %>%
      gather("ed", "present2", ed1:ed2, na.rm = TRUE) %>%
      select(-present, -present2)
    #>   id  dummy  ed
    #> 2  1 dummy2 ed1
    #> 3  3 dummy2 ed1
    #> 5  2 dummy1 ed2
    #> 8  4 dummy3 ed2
    

    Created on 2018-03-06 by the reprex package (v0.2.0).

    0 讨论(0)
提交回复
热议问题