Coxph predictions don't match the coefficients

社会主义新天地 提交于 2019-12-02 08:03:34

问题


Good afternoon,

I could post reproducible code and certainly will if everyone agrees that something is wrong, but right now I think my question is quite simple and someone will point me the right path.

I am working in a data set like this:

created_as_free_user     t     c
                 <fctr> <int> <int>
1                  true    36     0
2                  true    36     0
3                  true     0     1
4                  true    28     0
5                  true     9     0
6                  true     0     1
7                  true    13     0
8                  true    19     0
9                  true     9     0
10                 true    16     0

I fitted a Cox Regression model like this:

fit_train = coxph(Surv(time = t,event = c) ~ created_as_free_user ,data = teste)
summary(fit_train)

And received:

Call:
coxph(formula = Surv(time = t, event = c) ~ created_as_free_user, 
    data = teste)

  n= 9000, number of events= 1233 

                            coef exp(coef) se(coef)      z Pr(>|z|)    
created_as_free_usertrue -0.7205    0.4865   0.1628 -4.426 9.59e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                         exp(coef) exp(-coef) lower .95 upper .95
created_as_free_usertrue    0.4865      2.055    0.3536    0.6693

Concordance= 0.511  (se = 0.002 )
Rsquare= 0.002   (max possible= 0.908 )
Likelihood ratio test= 15.81  on 1 df,   p=7e-05
Wald test            = 19.59  on 1 df,   p=9.589e-06
Score (logrank) test = 20.45  on 1 df,   p=6.109e-06

So far so good. Next step: Predict the results on new data. I understand the different types of predictions that predict.coxph can give me (or at least I think I do). Let's use the type = "lp":

head(predict(fit_train,validacao,type = "lp"),n=20)

And get:

     1           2           3           4           5           6           7           8           9          10 
-0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 
         11          12          13          14          15          16          17          18          19          20 
-0.01208854 -0.01208854  0.70842049 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 

OK. But when I look at the data that I am trying to estimate:

# A tibble: 9,000 × 3
   created_as_free_user     t     c
                 <fctr> <int> <int>
1                  true    20     0
2                  true    12     0
3                  true     0     1
4                  true    10     0
5                  true    51     0
6                  true    36     0
7                  true    44     0
8                  true     0     1
9                  true    27     0
10                 true     6     0
# ... with 8,990 more rows

It makes me confuse....

The type = "lp" isn't suppose to give you the linear predictions? For this data above that I am trying to estimate, since the created_as_free_user variable is equal to true, Am I wrong expecting the type = "lp" prediction to be exactaly -0.7205 (the coefficient of the model above)? Where does the -0.01208854 came? I suspect it's some sort of scale situation, but couldn't find the answer online.

My final objective is the h(t) that is given by the prediction type = "expected", but I am not all that comfortable using it because it uses this -0.01208854 value that I don't fully understand.

Thanks a lot


回答1:


The Details section in ?predict.coxph reads:

The Cox model is a relative risk model; predictions of type "linear predictor", "risk", and "terms" are all relative to the sample from which they came. By default, the reference value for each of these is the mean covariate within strata.

To illustrate what this means, we can look at a simple example. Some fake data:

test1 <- list(time=c(4,3,1,1,1), 
             status=c(1,1,1,0,0), 
             x=c(0,2,1,1,0)) 

We fit a model and view predictions:

fit <- coxph(Surv(time, status) ~ x, test1) 
predict(fit, type = "lp")
# [1] -0.6976630  1.0464945  0.1744157  0.1744157 -0.6976630

The predictions are the same as:

(test1$x - mean(test1$x)) * coef(fit)
# [1] -0.6976630  1.0464945  0.1744157  0.1744157 -0.6976630

(Using this logic and some arithmetic we can back out from your results that you have 8849 "trues" out of 9000 observations for your created_as_free_user variable.)



来源:https://stackoverflow.com/questions/42843239/coxph-predictions-dont-match-the-coefficients

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!