问题
I want to simulate left truncated failure time data from Weibull distribution.
My objective is to simulate data and retrieve the coefficients(of x1,x2,x3,x4, and x5 which I used for the simulation) by fitting a Weibull regression model. Here the xt=runif(N, 30, 80)
denotes the start of the study, Tm <- qweibull(runif(N,pweibull(xt,shape = 7.5, scale = 82*exp(lp)),1), shape=7.5, scale=82*exp(lp))
variable denotes the failure time. But whenever I do the regression I am getting this warning message
Warning message:
In Surv(xt, time_M, event_M) : Stop time must be > start time, NA created```
This was my try:
N = 10^5
H <- within(data.frame(xt=runif(N, 30, 80), x1=rnorm(N, 2, 1), x2=rnorm(N, -2, 1)), {
x3 <- rnorm(N, 0.5*x1 + 0.5*x2, 2)
x4 <- rnorm(N, 0.3*x1 + 0.3*x2 + 0.3*x3, 2 )
lp1 <- -2 + 0.5*x1 + 0.2*x2 + 0.1*x3 + 0.2*x4
lp2 <- -2 + 0.5*x1 + 0.2*x2 + 0.1*x3 + 0.2*x4
lp3 <- 0.5*x1 + 0.2*x2 + 0.1*x3 + 0.2*x4
lp4 <- 0
P1 <- exp(lp1)/(exp(lp2)+ exp(lp3)+1+exp(lp1))
P2 <- exp(lp2)/(exp(lp1)+ exp(lp3)+1+exp(lp2))
P3 <- exp(lp3)/(exp(lp2)+ exp(lp1)+1+exp(lp3))
P4 <- 1/(exp(lp2)+ exp(lp3)+exp(lp1)+1)
mChoices <- t(apply(cbind(P1,P2,P3,P4), 1, rmultinom, n = 1, size = 1))
x5 <- apply(mChoices, 1, function(x) which(x==1))
lp <- 0.05*x1 + 0.2*x2 + 0.1*x3 + 0.02*x4 + log(1.5)*(x5==1) + log(5)*(x5==2) + log(2)*(x5==3)
Tm <- qweibull(runif(N,pweibull(xt,shape = 7.5, scale = 82*exp(lp)),1), shape=7.5, scale=82*exp(lp))
Cens <- 100
time_M <- pmin(Tm,Cens)
event_M <- time_M == Tm })
res.full_M <- weibreg(Surv(H$xt,H$time_M, H$event_M) ~ x1 + x2 + x3 + x4 + factor(x5), data = H)
So can anyone help me to modify this code so that I can get the starting age (xt) less than the corresponding failure time (time_M) and the fitted regression model have coefficients values close to that in the following equation
(lp <- 0.05*x1 + 0.2*x2 + 0.1*x3 + 0.02*x4 + log(1.5)*(x5==1) + log(5)*(x5==2) + log(2)*(x5==3)
)
回答1:
Your first comment implies that you want (possibly censored) times from age 30 to diagnosis. You have two options: work with "survival times" or with the date of of the patients 30th birthday and their date of diagnosis. It's easier to use the former, as it's easier to specify your censoring rate.
- Generate an uncensored survival time (T) from the distribution of your choice.
- Draw a random number from a Uniform(0, 1) distribution. If this number is less than your censoring rate, the observation is censored: go to 3. Otherwise, your uncensored observed survival time is (T).
- Draw another random variable (X) from a Uniform(0, 1) distribution. Set T = T*X. This is your censored survival time.
This procedure will give you data from any distribution of survival times, censored at the rate of your choice.
However, my reading of your specification tells me that every participant will at some point be diagnosed with the condition of interest. There are no competing risks. Is this reasonable?
Your second comment is confusing. Is your time to event (a) "time from age 30 to diagnosis" (which would imply right censoring) or (b) "time from onset of disease until diagnosis" (which would imply left censoring and could also involve right censoring). If (a), my solution still holds. If (b), you need to supply more information:
- What's the process (distribution) of time from age 30 to onset of disease?
- When/How frequently are diagnostic procedures conducted?
- What's the chance of a diagnostic procedure giving each of the following results: false positive, false negative, true positive, true negative
It's still possible to generate the data you want, but it's not as easy as in (a).
来源:https://stackoverflow.com/questions/62263404/how-do-i-simulate-a-left-truncated-weibull-failure-time-data-in-r