Efficient Collapse Dummy Variables

前端 未结 2 1818
闹比i
闹比i 2021-01-18 07:00

What is an efficient way (any solution including non-base packages welcomed) to collapse dummy variables back into a factor.

   race.White race.Hispanic race         


        
2条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-18 07:49

    We can use max.col to get the column index, subset the column names based on that and use sub to remove the prefix.

    sub('[^.]+\\.', '', names(dat)[max.col(dat)])
    #[1] "White"    "Asian"    "White"    "Black"    "Asian"    "Hispanic"
    #[7] "White"    "White"    "White"    "Black"  
    

    Here, I assumed that there is a single 1 per each row. If there are multiple 1s, we can use the option ties.method='first' or ties.method='last'.


    Or another option is doing the %*% with the sequence of columns, subset the column names, and remove the prefix with sub.

     sub('[^.]+\\.', '', names(dat)[(as.matrix(dat) %*%seq_along(dat))[,1]])
    

    Or we can use pmax

    sub('[^.]+\\.', '', names(dat)[do.call(pmax,dat*seq_along(dat)[col(dat)])])
    

提交回复
热议问题