What is an efficient way (any solution including non-base packages welcomed) to collapse dummy variables back into a factor.
race.White race.Hispanic race
We can use max.col
to get the column index, subset the column names based on that and use sub
to remove the prefix.
sub('[^.]+\\.', '', names(dat)[max.col(dat)])
#[1] "White" "Asian" "White" "Black" "Asian" "Hispanic"
#[7] "White" "White" "White" "Black"
Here, I assumed that there is a single 1
per each row. If there are multiple 1s, we can use the option ties.method='first'
or ties.method='last'
.
Or another option is doing the %*%
with the sequence of columns, subset the column names, and remove the prefix with sub
.
sub('[^.]+\\.', '', names(dat)[(as.matrix(dat) %*%seq_along(dat))[,1]])
Or we can use pmax
sub('[^.]+\\.', '', names(dat)[do.call(pmax,dat*seq_along(dat)[col(dat)])])