Creating categorical variables from mutually exclusive dummy variables

前端 未结 3 1429
忘了有多久
忘了有多久 2021-01-11 23:51

My question regards an elaboration on a previously answered question about combining multiple dummy variables into a single categorical variable.

In the question p

相关标签:
3条回答
  • 2021-01-12 00:09

    Well, I think you can do it simply with ifelse, something like :

    factor1 <- ifelse(is.na(conditionA), conditionB, conditionA)
    

    Another way could be :

    factor1 <- conditionA
    factor1[is.na(factor1)] <- conditionB
    

    And a third solution, certainly more pratical if you have more than two columns conditions :

    factor1 <- apply(df[,c("conditionA","conditionB")], 1, sum, na.rm=TRUE)
    
    0 讨论(0)
  • 2021-01-12 00:19

    I think this function gives you what you need (admittedly, this is a quick hack).

    to_indicator <- function(x, grp)
    {
        apply(tbl, 1,
              function (x)
              {
                  idx <- which(!is.na(x))
                  nm <- names(idx)
                  if (nm %in% grp)
                    x[idx]
                  else
                    NA
              })
    }
    

    And here is it's used with the example data you provide.

    tbl <- read.table(header=TRUE, text="
    conditionA    conditionB    conditionC     conditionD
    NA            1             NA             NA
    1             NA            NA             NA
    NA            NA            1              NA
    NA            NA            NA             1
    NA            2             NA             NA
    2             NA            NA             NA
    NA            NA            2              NA
    NA            NA            NA             2")
    tbl <- data.frame(tbl)
    
    (tbl <- cbind(tbl,
                  factor1=to_indicator(tbl, c("conditionA", "conditionB")),
                  factor2=to_indicator(tbl, c("conditionC", "conditionD"))))
    
    0 讨论(0)
  • 2021-01-12 00:21

    Update (2019): Please use dplyr::coalesce(), it works pretty much the same.

    My R package has a convenience function that allows to choose the first non-NA value for each element in a list of vectors:

    #library(devtools)
    #install_github('kimisc', 'muelleki')
    library(kimisc)
    
    df$factor1 <- with(df, coalesce.na(conditionA, conditionB))
    

    (I'm not sure if this works if conditionA and conditionB are factors. Convert them to numerics before using as.numeric(as.character(...)) if necessary.)

    Otherwise, you could give interaction a try, combined with recoding of the levels of the resulting factor -- but to me it looks like you're more interested in the first solution:

    df$conditionAB <- with(df, interaction(coalesce.na(conditionA, 0), 
                                           coalesce.na(conditionB, 0)))
    levels(df$conditionAB) <- c('A', 'B')
    
    0 讨论(0)
提交回复
热议问题