Cox proportional hazard model

问题

I am trying to run Cox proportional hazard model on a data of 4 groups. Here's the data:

I am using this code:

time_Allo_NHL<- c(28,32,49,84,357,933,1078,1183,1560,2114,2144)
censor_Allo_NHL<- c(rep(1,5), rep(0,6))

time_Auto_NHL<- c(42,53,57,63,81,140,176,210,252,476,524,1037)
censor_Auto_NHL<- c(rep(1,7), rep(0,1), rep(1,1), rep(0,1), rep(1,1), rep(0,1))

time_Allo_HOD<- c(2,4,72,77,79)
censor_Allo_HOD<- c(rep(1,5))

time_Auto_HOD<- c(30,36,41,52,62,108,132,180,307,406,446,484,748,1290,1345)
censor_Auto_HOD<- c(rep(1,7), rep(0,8))


myData <- data.frame(time=c(time_Allo_NHL, time_Auto_NHL, time_Allo_HOD, time_Auto_HOD),
                     censor=c(censor_Allo_NHL, censor_Auto_NHL, censor_Allo_HOD, censor_Auto_HOD),
                     group= rep(1:4,), each= )
str(myData)

The problem is each group has different number of observations. What I should modify in the code :

myData <- data.frame(time=c(time_Allo_NHL, time_Auto_NHL, time_Allo_HOD, time_Auto_HOD),
                     censor=c(censor_Allo_NHL, censor_Auto_NHL, censor_Allo_HOD,                                           
                     censor_Auto_HOD), group= rep(1:4,), each= )

Instead of writing each=# so I can run the code properly in order to complete doing the Cox proportional hazard model?

Then I have attempted to run a Cox proportional hazard model using the following code:

library(survival)

for(i in 1:43){
  if (myData$group[i]==2)
    myData$Z1[i]<-1
  else myData$Z1[i]<-0
}

for(i in 1:43){
  if (myData$group[i]==3)
    myData$Z2[i]<-1
  else myData$Z2[i]<-0
}

for(i in 1:43){
  if (myData$group[i]==4)
    myData$Z3[i]<-1
  else myData$Z3[i]<-0
}

myData

Coxfit<-coxph(Surv(time,censor)~Z1+Z2+Z3, data = myData)
summary(Coxfit)

This is all I got. There's no valuse!!

Next, I want to test for an interaction between type of transplant and disease type using main effects and interaction terms.

The code I'm going to use:

n<-length(myData$time)
n

for (i in 1:n){
  if (myData$(here?)[i]==2)
    myData$W1[i] <-1
  else myData$W1[i]<-0
}

for (i in 1:n){
  if (myData$(here?)[i]==2)
    myData$W2[i] <-1
  else myData$W2[i]<-0
}

myData

Coxfit.W<-coxph(Surv(time,censor)~W1+W2+W1*W2, data = myData)
summary(Coxfit.W)

I'm not sure what it should be written in here (myData$(here?) from the above code.

回答1:

This looks like the bone marrow transplant study at Ohio State University.

As you mentioned, each group has different numbers of observations per group. I would consider binding the rows from each subgroup together in the end.

First, would create a data frame for each group. I would add a column indicating which group they belonged to. So, for example, in df_Allo_NHL would have all of the observations have Allo NHL for group:

df_Allo_NHL <- data.frame(group = "Allo NHL", 
                          time = c(28,32,49,84,357,933,1078,1183,1560,2114,2144),
                          censor = c(rep(1,5), rep(0,6)))

Or just adding to the 2 vectors you have already:

df_Allo_NHL <- data.frame(group = "Allo NHL", time = time_Allo_NHL, censor = censor_Allo_NHL)

Then once you have your 4 data frames, you can combine them. One way to do this is by using Reduce and putting all your data frames in a list. The final result should be ready for cox proportional hazards analysis, in long form, and you will have group available to include. (Edit: Z1 and Z2 added from table for model.)

time_Allo_NHL<- c(28,32,49,84,357,933,1078,1183,1560,2114,2144)
censor_Allo_NHL<- c(rep(1,5), rep(0,6))
df_Allo_NHL <- data.frame(group = "Allo NHL", 
                          time = time_Allo_NHL,
                          censor = censor_Allo_NHL,
                          Z1 = c(90,30,40,60,70,90,100,90,80,80,90),
                          Z2 = c(24,7,8,10,42,9,16,16,20,27,5))

time_Auto_NHL<- c(42,53,57,63,81,140,176,210,252,476,524,1037)
censor_Auto_NHL<- c(rep(1,7), rep(0,1), rep(1,1), rep(0,1), rep(1,1), rep(0,1))
df_Auto_NHL <- data.frame(group = "Auto NHL", 
                          time = time_Auto_NHL, 
                          censor = censor_Auto_NHL,
                          Z1 = c(80,90,30,60,50,100,80,90,90,90,90,90),
                          Z2 = c(19,17,9,13,12,11,38,16,21,24,39,84))

time_Allo_HOD<- c(2,4,72,77,79)
censor_Allo_HOD<- c(rep(1,5))
df_Allo_HOD <- data.frame(group = "Allo HOD", 
                          time = time_Allo_HOD, 
                          censor = censor_Allo_HOD,
                          Z1 = c(20,50,80,60,70),
                          Z2 = c(34,28,59,102,71))

time_Auto_HOD<- c(30,36,41,52,62,108,132,180,307,406,446,484,748,1290,1345)
censor_Auto_HOD<- c(rep(1,7), rep(0,8))
df_Auto_HOD <- data.frame(group = "Auto HOD", 
                          time = time_Auto_HOD, 
                          censor = censor_Auto_HOD,
                          Z1 = c(90,80,70,60,90,70,60,100,100,100,100,90,90,90,80),
                          Z2 = c(73,61,34,18,40,65,17,61,24,48,52,84,171,20,98))

myData <- Reduce(rbind, list(df_Allo_NHL, df_Auto_NHL, df_Allo_HOD, df_Auto_HOD))

Edit

If you go ahead and also add Z1 (Karnofsky Score) and Z2 (waiting time from diagnosis to transplant), you can do the CPH survival model like this below. group is already a factor and the first level Allo NHL would by default be there reference category.

library(survival)

Coxfit<-coxph(Surv(time,censor)~group+Z1+Z2, data = myData)
summary(Coxfit)

Output

Call:
coxph(formula = Surv(time, censor) ~ group + Z1 + Z2, data = myData)

  n= 43, number of events= 26 

                  coef exp(coef) se(coef)      z Pr(>|z|)    
groupAuto NHL  0.77357   2.16748  0.58631  1.319  0.18704    
groupAllo HOD  2.73673  15.43639  0.94081  2.909  0.00363 ** 
groupAuto HOD  1.06293   2.89485  0.63494  1.674  0.09412 .  
Z1            -0.05052   0.95074  0.01222 -4.135 3.55e-05 ***
Z2            -0.01660   0.98354  0.01002 -1.656  0.09769 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

              exp(coef) exp(-coef) lower .95 upper .95
groupAuto NHL    2.1675    0.46136    0.6869    6.8395
groupAllo HOD   15.4364    0.06478    2.4419   97.5818
groupAuto HOD    2.8948    0.34544    0.8340   10.0481
Z1               0.9507    1.05181    0.9282    0.9738
Z2               0.9835    1.01674    0.9644    1.0030

Concordance= 0.783  (se = 0.059 )
Likelihood ratio test= 32.48  on 5 df,   p=5e-06
Wald test            = 28.48  on 5 df,   p=3e-05
Score (logrank) test = 39.45  on 5 df,   p=2e-07

Data

      group time censor  Z1  Z2
1  Allo NHL   28      1  90  24
2  Allo NHL   32      1  30   7
3  Allo NHL   49      1  40   8
4  Allo NHL   84      1  60  10
5  Allo NHL  357      1  70  42
6  Allo NHL  933      0  90   9
7  Allo NHL 1078      0 100  16
8  Allo NHL 1183      0  90  16
9  Allo NHL 1560      0  80  20
10 Allo NHL 2114      0  80  27
11 Allo NHL 2144      0  90   5
12 Auto NHL   42      1  80  19
13 Auto NHL   53      1  90  17
14 Auto NHL   57      1  30   9
15 Auto NHL   63      1  60  13
16 Auto NHL   81      1  50  12
17 Auto NHL  140      1 100  11
18 Auto NHL  176      1  80  38
19 Auto NHL  210      0  90  16
20 Auto NHL  252      1  90  21
21 Auto NHL  476      0  90  24
22 Auto NHL  524      1  90  39
23 Auto NHL 1037      0  90  84
24 Allo HOD    2      1  20  34
25 Allo HOD    4      1  50  28
26 Allo HOD   72      1  80  59
27 Allo HOD   77      1  60 102
28 Allo HOD   79      1  70  71
29 Auto HOD   30      1  90  73
30 Auto HOD   36      1  80  61
31 Auto HOD   41      1  70  34
32 Auto HOD   52      1  60  18
33 Auto HOD   62      1  90  40
34 Auto HOD  108      1  70  65
35 Auto HOD  132      1  60  17
36 Auto HOD  180      0 100  61
37 Auto HOD  307      0 100  24
38 Auto HOD  406      0 100  48
39 Auto HOD  446      0 100  52
40 Auto HOD  484      0  90  84
41 Auto HOD  748      0  90 171
42 Auto HOD 1290      0  90  20
43 Auto HOD 1345      0  80  98

来源：https://stackoverflow.com/questions/60919837/cox-proportional-hazard-model

标签

dataframe

cox-regression