问题
Custom contrasts are very widely used in analyses, e.g.: "Do DV values at level 1 and level 3 of this three-level factor differ significantly?"
Intuitively, this contrast is expressed in terms of cell means as:
c(1,0,-1)
One or more of these contrasts, bound as columns, form a contrast coefficient matrix, e.g.
mat = matrix(ncol = 2, byrow = TRUE, data = c(
1, 0,
0, 1,
-1, -1)
)
[,1] [,2]
[1,] 1 0
[2,] 0 1
[3,] -1 -1
However, when it comes to running these contrasts specified by the coefficient matrix, there is a lot of (apparently contradictory) information on the web and in books. My question is which information is correct?
Claim 1: contrasts(factor) takes a coefficient matrix
In some examples, the user is shown that the intuitive contrast coefficient matrix can be used directly via the contrasts() or C()
functions. So it's as simple as:
contrasts(myFactor) <- mat
Claim 2: Transform coefficients to create a coding scheme
Elsewhere (e.g. UCLA stats) we are told the coefficient matrix (or basis matrix) must be transformed from a coefficient matrix into a contrast matrix before use. This involves taking the inverse of the transform of the coefficient matrix: (mat')⁻¹
, or, in Rish:
contrasts(myFactor) = solve(t(mat))
This method requires padding the matrix with an initial column of means for the intercept. To avoid this, some sites recommend using a generalized inverse function which can cope with non-square matrices, i.e., MASS::ginv()
contrasts(myFactor) = ginv(t(mat))
Third option: premultiply by the transform, take the inverse, and post multiply by the transform
Elsewhere again (e.g. a note from SPSS support), we learn the correct algebra is: (mat'mat)-¹ mat'
Implying to me that the correct way to create the contrasts matrix should be:
x = solve(t(mat)%*% mat)%*% t(mat)
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 1 0 -1
[3,] 0 1 -1
contrasts(myFactor) = x
My question is, which is right? (If I am interpreting and describing each piece of advice accurately). How does one specify custom contrasts in R for lm
, lme
etc?
Refs
回答1:
Claim 2 is correct (see the answers here and here) and sometimes claim 1, too. This is because there are cases in which the generalized inverse of the (transposed) coefficient matrix is equal to the matrix itself.
回答2:
For what it's worth....
If you have a factor with 3 levels (levels A, B, and C) and you want to test the following orthogonal contrasts: A vs B, and the avg. of A and B vs C, your contrast codes would be:
Cont1<- c(1,-1, 0)
Cont2<- c(.5,.5, -1)
If you do as directed on the UCLA site (transform coefficients to make a coding scheme), as such:
Contrasts(Variable)<- solve(t(cbind(c(1,1,1), Cont1, Cont2)))[,2:3]
then your results are IDENTICAL to if you had created two dummy variables (e.g.:
Dummy1<- ifelse(Variable=="A", 1, ifelse(Variable=="B", -1, 0))
Dummy2<- ifelse(Variable=="A", .5, ifelse(Variable=="B", .5, -1))
and entered them both into the regression equation instead of your factor, which makes me inclined to think that this is the correct way.
PS I don't write the most elegant R code, but it gets the job done. Sorry, I'm sure there are easier ways to recode variables, but you get the gist.
回答3:
I'm probably missing something, but in each of your three examples, you specify the contrast matrix in the same way, i.e.
## Note it should plural of contrast
contrasts(myFactor) = x
The only thing that differs is the value of x
.
Using the data from the UCLA website as an example
hsb2 = read.table('http://www.ats.ucla.edu/stat/data/hsb2.csv', header=T, sep=",")
#creating the factor variable race.f
hsb2$race.f = factor(hsb2$race, labels=c("Hispanic", "Asian", "African-Am", "Caucasian"))
We can specify either the treatment
version of the contrasts
contrasts(hsb2$race.f) = contr.treatment(4)
summary(lm(write ~ race.f, hsb2))
or the sum
version
contrasts(hsb2$race.f) = contr.sum(4)
summary(lm(write ~ race.f, hsb2))
Alternatively, we can specify a bespoke contrast matrix.
See ?contr.sum
for other standard contrasts.
来源:https://stackoverflow.com/questions/31818174/custom-contrasts-in-r-contrast-coefficient-matrix-or-contrast-matrix-coding-s