Creating categorical variables from mutually exclusive dummy variables

问题

My question regards an elaboration on a previously answered question about combining multiple dummy variables into a single categorical variable.

In the question previously asked, the categorical variable was created from dummy variables that were NOT mutually exclusive. For my case, my dummy variables are mutually exclusive because they represent crossed experimental conditions in a 2X2 between-subjects factorial design (that also has a within subjects component which I'm not addressing here), so I don't think interaction does what I need to do.

For example, my data might look like this:

id   conditionA    conditionB    conditionC     conditionD
1    NA            1             NA             NA
2    1             NA            NA             NA
3    NA            NA            1              NA
4    NA            NA            NA             1
5    NA            2             NA             NA
6    2             NA            NA             NA
7    NA            NA            2              NA
8    NA            NA            NA             2

I'd like to now make categorical variables that combine ACROSS different types of conditions. For example, people who had values for condition A and B might be coded with one categorical variable, and people who had values for condition C and D.

id   conditionA    conditionB    conditionC     conditionD  factor1    factor2
1    NA            1             NA             NA          1          NA
2    1             NA            NA             NA          1          NA
3    NA            NA            1              NA          NA         1
4    NA            NA            NA             1           NA         1
5    NA            2             NA             NA          2          NA
6    2             NA            NA             NA          2          NA
7    NA            NA            2              NA          NA         2
8    NA            NA            NA             2           NA         2

Right now, I'm doing this using ifelse() statements, which quite simply is a hot mess (and doesn't always work). Please help! There's probably some super-obvious "easier way."

EDIT:

The kinds of ifelse commands that I am using are as follows:

attach(df)
df$factor<-ifelse(conditionA==1 | conditionB==1, 1, NA)
df$factor<-ifelse(conditionA==2 | conditionB==2, 2, df$factor)

In reality, I'm combining across 6-8 columns each time, so a more elegant solution would help a lot.

回答1:

Update (2019): Please use dplyr::coalesce(), it works pretty much the same.

My R package has a convenience function that allows to choose the first non-NA value for each element in a list of vectors:

#library(devtools)
#install_github('kimisc', 'muelleki')
library(kimisc)

df$factor1 <- with(df, coalesce.na(conditionA, conditionB))

(I'm not sure if this works if conditionA and conditionB are factors. Convert them to numerics before using as.numeric(as.character(...)) if necessary.)

Otherwise, you could give interaction a try, combined with recoding of the levels of the resulting factor -- but to me it looks like you're more interested in the first solution:

df$conditionAB <- with(df, interaction(coalesce.na(conditionA, 0), 
                                       coalesce.na(conditionB, 0)))
levels(df$conditionAB) <- c('A', 'B')

回答2:

I think this function gives you what you need (admittedly, this is a quick hack).

to_indicator <- function(x, grp)
{
    apply(tbl, 1,
          function (x)
          {
              idx <- which(!is.na(x))
              nm <- names(idx)
              if (nm %in% grp)
                x[idx]
              else
                NA
          })
}

And here is it's used with the example data you provide.

tbl <- read.table(header=TRUE, text="
conditionA    conditionB    conditionC     conditionD
NA            1             NA             NA
1             NA            NA             NA
NA            NA            1              NA
NA            NA            NA             1
NA            2             NA             NA
2             NA            NA             NA
NA            NA            2              NA
NA            NA            NA             2")
tbl <- data.frame(tbl)

(tbl <- cbind(tbl,
              factor1=to_indicator(tbl, c("conditionA", "conditionB")),
              factor2=to_indicator(tbl, c("conditionC", "conditionD"))))

回答3:

Well, I think you can do it simply with ifelse, something like :

factor1 <- ifelse(is.na(conditionA), conditionB, conditionA)

Another way could be :

factor1 <- conditionA
factor1[is.na(factor1)] <- conditionB

And a third solution, certainly more pratical if you have more than two columns conditions :

factor1 <- apply(df[,c("conditionA","conditionB")], 1, sum, na.rm=TRUE)

来源：https://stackoverflow.com/questions/16135316/creating-categorical-variables-from-mutually-exclusive-dummy-variables

标签

categorical-data

dummy-variable