问题
I am trying to change the category variables into dummy variables. "season","holiday","workingday","weather","temp","atemp","humidity","windspeed", "registered","count","hour","dow" are all variables.
Here is my code:
#dummy
library(dummies)
#set up new dummy variables
data.new = data.frame(data)
data.new = cbind(data.new,dummy(data.new$season, sep = "_"))
data.new = cbind(data.new,dummy(data.new$holiday, sep = "_"))
data.new = cbind(data.new,dummy(data.new$weather, sep = "_"))
data.new = cbind(data.new,dummy(data.new$dow, sep = "_"))
data.new = cbind(data.new,dummy(data.new$hour, sep = "_"))
data.new = cbind(data.new,dummy(data.new$workingday, sep = "_"))
#delete the old variables
data.new = data.new[,-1]
data.new = data.new[,-1]
data.new = data.new[,-2]
data.new = data.new[,-8]
data.new = data.new[,-8]
data.new = data.new[,-1]
Should I delete the old variables after generating the dummy variables? If I want to do PCR, may I use all variables, e.g.
fit = pcr(count~.,data = data.new)
to generate a linear regression model? Or should I just use the not dummy variables?
fit = pcr(count~temp+atemp+humidity+windspeed+registered,data = data.new)
Sorry to cause your misunderstanding. I used lm function as an example. Now I have changed it into pcr function. Thank you for reading this question!
回答1:
As long as your categorical variables are factors, the lm
function will handle the creation of dummy variables for you.
I would recommend you first verify that your data is a data.frame
and the predictors that are categorical are indeed factors.
class(data)
sapply(data, class)
Or more simply
str(data)
Then, simply put them in your formula in your lm
call.
fit = lm(count ~ season + holiday + workingday + weather + temp + atemp + humidity + windspeed + registered + hour + dow, data=data)
Or if the columns in the formula are the only ones in your data.frame
then you can use the short-hand.
fit = lm(count ~ ., data=data)
来源:https://stackoverflow.com/questions/47820249/after-generating-dummy-variables