问题
How do I use the formula interface if I want custom valued dummies, e.g. if I want values 1 and two, rather than 0 and 1. The estimation might look like the following where supp
is a factor variable.
fit <- lm(len ~ dose + supp, data = ToothGrowth)
In this example, there is not much use of the different values, but in many cases of a "re-written" model it can be useful.
EDIT: Actually, I have e.g. 3 levels, and want the two columns to be coded differently, so one is a 1/0 variable, and the other is a 1/2 variable. The above example only has two levels.
回答1:
You can set the contrasts to be whatever you want by creating the matrix you want to use and setting it either to the contrasts
argument of lm
or setting the default contrast of the factor itself.
Some sample data:
set.seed(6)
d <- data.frame(g=gl(3,5,labels=letters[1:3]), x=round(rnorm(15,50,20)))
The contrasts you have in mind:
mycontrasts <- matrix(c(0,0,1,0,1,1), byrow=TRUE, nrow=3)
colnames(mycontrasts) <- c("12","23")
mycontrasts
# 12 23
#[1,] 0 0
#[2,] 1 0
#[3,] 1 1
Then you use this in the lm
call:
> lm(x ~ g, data=d, contrasts=list(g=mycontrasts))
Call:
lm(formula = x ~ g, data = d, contrasts = list(g = mycontrasts))
Coefficients:
(Intercept) g12 g23
58.8 -13.6 5.8
We can check that it does the right thing by comparing the means:
> diff(tapply(d$x, d$g, mean))
b c
-13.6 5.8
The default contrast is to use the first level as baseline:
> lm(x ~ g, data=d)
Call:
lm(formula = x ~ g, data = d)
Coefficients:
(Intercept) gb gc
58.8 -13.6 -7.8
But that can be changed with the contrasts
command:
> contrasts(d$g) <- mycontrasts
> lm(x ~ g, data=d)
Call:
lm(formula = x ~ g, data = d)
Coefficients:
(Intercept) g12 g23
58.8 -13.6 5.8
来源:https://stackoverflow.com/questions/9616742/r-and-factor-coding-in-formula