R model.matrix using same factor set among all columns

依然范特西╮ 提交于 2019-12-12 19:21:16

问题


I have a set of basketball lineup data with five columns, each sharing the same factor, like so:

head(dat)
              V1             V2            V3            V4              V5
1   MILES,KEATON KINGSLEY,MOSES  BELL,ANTHLON HANNAHS,DUSTY   DURHAM,JABRIL
2   MILES,KEATON KINGSLEY,MOSES  BELL,ANTHLON HANNAHS,DUSTY   DURHAM,JABRIL
3 KINGSLEY,MOSES   BELL,ANTHLON HANNAHS,DUSTY DURHAM,JABRIL   THOMPSON,TREY
4 KINGSLEY,MOSES   BELL,ANTHLON HANNAHS,DUSTY THOMPSON,TREY     BEARD,ANTON
5  THOMPSON,TREY    BEARD,ANTON KOUASSI,WILLY   WHITT,JIMMY WATKINS,MANUALE
6  THOMPSON,TREY    BEARD,ANTON KOUASSI,WILLY   WHITT,JIMMY WATKINS,MANUALE

What I want to do is have each row be a dummy encoding of the current factors shown on the row, like this:

MILES,KEATON  KINGSLEY,MOSES  BELL,ANTHLON  HANNAHS,DUSTY  DURHAM,JABRIL THOMPSON,TREY  BEARD,ANTON  KOUASSI,WILLY  WHITT,JIMMY  WATKINS,MANUALE
           1               1             1              1              1             0            0               0             0               0
           1               1             1              1              1             0            0               0             0               0
           0               1             1              1              1             1            0               0             0               0

However, model.matrix only seems to have a scope of one column; it won't let me share an entire factor set across multiple columns. Following some advice in [this thread][1], I tried:

df <- as.data.frame(lapply(dat,as.factor))
fList <- lapply(names(df),reformulate,intercept=FALSE)
mList <- lapply(fList,sparse.model.matrix,data=df)
br <- do.call(cBind,mList)
head(br)
6 x 31 sparse Matrix of class "dgCMatrix"
   [[ suppressing 31 column names ‘V1BEARD,ANTON’, ‘V1BELL,ANTHLON’, ‘V1KINGSLEY,MOSES’ ... ]]

1 . . . 1 . . . . 1 . . 1 . . . . . . 1 . . . . . . 1 . . . . .
2 . . . 1 . . . . 1 . . 1 . . . . . . 1 . . . . . . 1 . . . . .
3 . . 1 . . . 1 . . . . . . 1 . . . 1 . . . . . . . . . . . 1 .
4 . . 1 . . . 1 . . . . . . 1 . . . . . . . 1 . . 1 . . . . . .
5 . . . . 1 1 . . . . . . . . 1 . . . . . . . . 1 . . . . . . 1
6 . . . . 1 1 . . . . . . . . 1 . . . . . . . . 1 . . . . . . 1

It combines the column name and the factor name. What do I do?


回答1:


We can try with mtabulate from qdapTools

library(qdapTools)
mtabulate(as.data.frame(t(df1)))
# BELL,ANTHLON DURHAM,JABRIL HANNAHS,DUSTY KINGSLEY,MOSES MILES,KEATON THOMPSON,TREY BEARD,ANTON KOUASSI,WILLY
#1            1             1             1              1            1             0           0             0
#2            1             1             1              1            1             0           0             0
#3            1             1             1              1            0             1           0             0
#4            1             0             1              1            0             1           1             0
#5            0             0             0              0            0             1           1             1
#6            0             0             0              0            0             1           1             1
#  WATKINS,MANUALE WHITT,JIMMY
#1               0           0
#2               0           0
#3               0           0
#4               0           0
#5               1           1
#6               1           1

Or using base R

 table(rep(1:nrow(df1), ncol(df1)), unlist(df1))


来源:https://stackoverflow.com/questions/38413194/r-model-matrix-using-same-factor-set-among-all-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!