Plotting predicted survival curves for continuous covariates in ggplot

问题

How can I plot survival curves for representative values of a continuous covariate in a cox proportional hazards model? Specifically, I would like to do this in ggplot using a "survfit.cox" "survfit" object.

This may seem like a question that has already been answered, but I have searched through everything in SO with the terms 'survfit' and 'newdata' (plus many other search terms). This is the thread that comes closest to answering my question so far: Plot Kaplan-Meier for Cox regression

In keeping with the reproducible example offered in one of the answers to that post:

url <- "http://socserv.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt"
df <- read.table(url, header = TRUE)

library(dplyr)
library(ggplot2)
library(survival)
library(magrittr)
library(broom)

# Identifying the 25th and 75th percentiles for prio (continuous covariate)

summary(df$prio)

# Cox proportional hazards model with other covariates
# 'prio' is our explanatory variable of interest

m1 <- coxph(Surv(week, arrest) ~ 
                       fin + age + race + prio,
                     data = df)

# Creating new df to get survival predictions
# Want separate curves for the the different 'fin' and 'race'
# groups as well as the 25th and 75th percentile of prio

newdf <- df %$%
  expand.grid(fin = levels(fin), 
                    age = 30, 
                    race = levels(race), 
                    prio = c(1,4))

# Obtain the fitted survival curve, then tidy 
# into a dataframe that can be used in ggplot

survcurv <- survfit(m1, newdata = newdf) %>%
  tidy()

The problem is, that once I have this dataframe called survcurv, I cannot tell which of the 'estimate' variables belongs to which pattern because none of the original variables are retained. For example, which of the 'estimate' variables represents the fitted curve for 30 year old, race = 'other', prio = '4', fin = 'no'?

In all other examples i've seen, usually one puts the survfit object into a generic plot() function and does not add a legend. I want to use ggplot and add a legend for each of the predicted curves.

In my own dataset, the model is a lot more complex and there are a lot more curves than I show here, so as you can imagine seeing 40 different 'estimate.1'..'estimate.40' variables makes it hard to understand what is what.

回答1:

Thanks for providing a well phrased question and a good example. I'm a little surpirsed that tidy does a relatively poor job here of creating sensible output. Please see below for my attempt at creating some plottable data:

library(tidyr)
newdf$group <- as.character(1:nrow(newdf))

survcurv <- survfit(m1, newdata = newdf) %>%
  tidy() %>% 
  gather('key', 'value', -time, -n.risk, -n.event, -n.censor) %>% 
  mutate(group = substr(key, nchar(key), nchar(key)),
         key   = substr(key, 1, nchar(key) - 2)) %>% 
  left_join(newdf, 'group') %>% 
  spread(key, value)

And the create a plot (perhaps you'd like to use geom_step instead, but there is not step shaped ribbon, unfortunately):

ggplot(survcurv, aes(x = time, y = estimate, ymin = conf.low, ymax = conf.high,
                     col = race, fill = race)) +
  geom_line(size = 1) +
  geom_ribbon(alpha = 0.2, col = NA) +
  facet_grid(prio ~ fin)

回答2:

Try defining your survcurv like this:

survcurv <- 
  lapply(1:nrow(newdf),
         function(x, m1, newdata){
           cbind(newdata[x, ], survfit(m1, newdata[x, ]) %>% tidy)
         },
         m1, 
         newdf) %>%
  bind_rows()

This will include all of the predictor values as columns with the predicted estimates.

来源：https://stackoverflow.com/questions/39872335/plotting-predicted-survival-curves-for-continuous-covariates-in-ggplot

标签

ggplot2

survival-analysis

cox-regression

broom