问题
I am fitting a cox model to some data that is structured as such:
str(test)
'data.frame': 147 obs. of 8 variables:
$ AGE : int 71 69 90 78 61 74 78 78 81 45 ...
$ Gender : Factor w/ 2 levels "F","M": 2 1 2 1 2 1 2 1 2 1 ...
$ RACE : Factor w/ 5 levels "","BLACK","HISPANIC",..: 5 2 5 5 5 5 5 5 5 1 ...
$ SIDE : Factor w/ 2 levels "L","R": 1 1 2 1 2 1 1 1 2 1 ...
$ LESION.INDICATION: Factor w/ 12 levels "CLAUDICATION",..: 1 11 4 11 9 1 1 11 11 11 ...
$ RUTH.CLASS : int 3 5 4 5 4 3 3 5 5 5 ...
$ LESION.TYPE : Factor w/ 3 levels "","OCCLUSION",..: 3 3 2 3 3 3 2 3 3 3 ...
$ Primary : int 1190 1032 166 689 219 840 1063 115 810 157 ...
the RUTH.CLASS
variable is actually a factor, and i've changed it to one as such:
> test$RUTH.CLASS <- as.factor(test$RUTH.CLASS)
> summary(test$RUTH.CLASS)
3 4 5 6
48 56 35 8
great.
after fitting the model
stent.surv <- Surv(test$Primary)
> cox.ruthclass <- coxph(stent.surv ~ RUTH.CLASS, data=test )
>
> summary(cox.ruthclass)
Call:
coxph(formula = stent.surv ~ RUTH.CLASS, data = test)
n= 147, number of events= 147
coef exp(coef) se(coef) z Pr(>|z|)
RUTH.CLASS4 0.1599 1.1734 0.1987 0.804 0.42111
RUTH.CLASS5 0.5848 1.7947 0.2263 2.585 0.00974 **
RUTH.CLASS6 0.3624 1.4368 0.3846 0.942 0.34599
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
RUTH.CLASS4 1.173 0.8522 0.7948 1.732
RUTH.CLASS5 1.795 0.5572 1.1518 2.796
RUTH.CLASS6 1.437 0.6960 0.6762 3.053
Concordance= 0.574 (se = 0.026 )
Rsquare= 0.045 (max possible= 1 )
Likelihood ratio test= 6.71 on 3 df, p=0.08156
Wald test = 7.09 on 3 df, p=0.06902
Score (logrank) test = 7.23 on 3 df, p=0.06478
> levels(test$RUTH.CLASS)
[1] "3" "4" "5" "6"
When i fit more variables in the model, similar things happen:
cox.fit <- coxph(stent.surv ~ RUTH.CLASS + LESION.INDICATION + LESION.TYPE, data=test )
>
> summary(cox.fit)
Call:
coxph(formula = stent.surv ~ RUTH.CLASS + LESION.INDICATION +
LESION.TYPE, data = test)
n= 147, number of events= 147
coef exp(coef) se(coef) z Pr(>|z|)
RUTH.CLASS4 -0.5854 0.5569 1.1852 -0.494 0.6214
RUTH.CLASS5 -0.1476 0.8627 1.0182 -0.145 0.8847
RUTH.CLASS6 -0.4509 0.6370 1.0998 -0.410 0.6818
LESION.INDICATIONEMBOLIC -0.4611 0.6306 1.5425 -0.299 0.7650
LESION.INDICATIONISCHEMIA 1.3794 3.9725 1.1541 1.195 0.2320
LESION.INDICATIONISCHEMIA/CLAUDICATION 0.2546 1.2899 1.0189 0.250 0.8027
LESION.INDICATIONREST PAIN 0.5302 1.6993 1.1853 0.447 0.6547
LESION.INDICATIONTISSUE LOSS 0.7793 2.1800 1.0254 0.760 0.4473
LESION.TYPEOCCLUSION -0.5886 0.5551 0.4360 -1.350 0.1770
LESION.TYPESTEN -0.7895 0.4541 0.4378 -1.803 0.0714 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
RUTH.CLASS4 0.5569 1.7956 0.05456 5.684
RUTH.CLASS5 0.8627 1.1591 0.11726 6.348
RUTH.CLASS6 0.6370 1.5698 0.07379 5.499
LESION.INDICATIONEMBOLIC 0.6306 1.5858 0.03067 12.964
LESION.INDICATIONISCHEMIA 3.9725 0.2517 0.41374 38.141
LESION.INDICATIONISCHEMIA/CLAUDICATION 1.2899 0.7752 0.17510 9.503
LESION.INDICATIONREST PAIN 1.6993 0.5885 0.16645 17.347
LESION.INDICATIONTISSUE LOSS 2.1800 0.4587 0.29216 16.266
LESION.TYPEOCCLUSION 0.5551 1.8015 0.23619 1.305
LESION.TYPESTEN 0.4541 2.2023 0.19250 1.071
Concordance= 0.619 (se = 0.028 )
Rsquare= 0.137 (max possible= 1 )
Likelihood ratio test= 21.6 on 10 df, p=0.01726
Wald test = 22.23 on 10 df, p=0.01398
Score (logrank) test = 23.46 on 10 df, p=0.009161
> levels(test$LESION.INDICATION)
[1] "CLAUDICATION" "EMBOLIC" "ISCHEMIA" "ISCHEMIA/CLAUDICATION"
[5] "REST PAIN" "TISSUE LOSS"
> levels(test$LESION.TYPE)
[1] "" "OCCLUSION" "STEN"
truncated output from model.matrix
below:
> model.matrix(cox.fit)
RUTH.CLASS4 RUTH.CLASS5 RUTH.CLASS6 LESION.INDICATIONEMBOLIC LESION.INDICATIONISCHEMIA
1 0 0 0 0 0
2 0 1 0 0 0
We can see that the the first level of each of these is being excluded from the model. Any input would be greatly appreciated. I noticed that on the LESION.TYPE
variable, the blank level ""
is not being included, but that is not by design - that should be a NA
or something similar.
I'm confused and could use some help with this. Thanks.
回答1:
Factors in any model return coefficients based on a base level (a contrast).Your contrasts
default to a base factor. There is no point in calculating a coefficient for the dropped value because the model will return the predictions when that dropped value = 1 given that all the other factor values are 0 (factors are complete and mutually exclusive for every observation). You can alter your default contrast by changing the contrasts
in your options
.
For your coefficients to be versus an average of all factors:
options(contrasts=c(unordered="contr.sum", ordered="contr.poly"))
For your coefficients to be versus a specific treatment (what you have above and your default):
options(contrasts=c(unordered="contr.treatment", ordered="contr.poly"))
As you can see there are two types of factors in R: unordered (or categorical, e.g. red, green, blue) and ordered (e.g. strongly disagree, disagree, no opinion, agree, strongly agree)
来源:https://stackoverflow.com/questions/21367259/r-cox-hazard-model-not-including-levels-of-a-factor