问题
I have a multi-variant data frame and want to convert the categorical data inside to dummy variables, I used model.matrix but it does not quite work. Please refer to the example below:
age = c(1:15) #numeric
sex = c(rep(0,7),rep(1,8)); sex = as.factor(sex) #factor
bloodtype = c(rep('A',2),rep('B',8),rep('O',1),rep('AB',4));bloodtype = as.factor(bloodtype) #factor
bodyweight = c(11:25) #numeric
wholedata = data.frame(cbind(age,sex,bloodtype,bodyweight))
model.matrix(~.,data=wholedata)[,-1]
The reason I did not use model.matrix(~age+sex+bloodtype+bodyweight)[,-1]
is because this is just a toy example. In the real data, I could have tens or hundreds more columns. I do not think type all variable names here is a good idea.
Thanks
回答1:
It's the cbind
that's messing things up. It converts your factors to numerics which are then not interpreted correctly by model.matrix
.
If you just do wholedata = data.frame(age,sex,bloodtype,bodyweight)
there should be no problem.
cbind
returns a matrix and in a matrix everything must have the same type. The result in this example is that the factors are converted to integers (which is the underlying representation of a factor in the first place) and then the type of the matrix is integer.
Try
wholedata = cbind(age,sex,bloodtype,bodyweight)
is.integer(wholedata) ## TRUE
is.factor(wholedata[,2]) ## FALSE
wholedata = data.frame(age,sex,bloodtype,bodyweight)
is.integer(wholedata) ## FALSE
is.factor(wholedata[,2]) ## TRUE
来源:https://stackoverflow.com/questions/25412897/r-change-categorical-data-to-dummy-variables