问题
I'm trying to plot survival curves for several variables facet by the variable sex with the ggsurvplot_facet() function. When I apply my code to a single fitted model, it works fine. However, when I try to use the same code within a function or within a for loop, it fails to plot all the survival curves that should be plotted and returns an error. I would perform this plotting in ggsurvplot_facet() itself if it allowed as input a list of survfit elements, in the same way ggsurvplot() does, but ggsurvplot_facet() only allows for a single survfit element at a time.
I'm running my code in RStudio in a 2018 MacBook Pro with Mac OS High Sierra.
Consider the following dataset: http://s000.tinyupload.com/index.php?file_id=01704535336107726906
It contains the observations for several visits for 100 subjects and 4 different variables. Two of the variables (variable1 and variable2) can have two different values (0 or 1) and the two other variables (variable3 and variable4) can have three different values (0, 1 or 2).
I have started to work with the ones that can have two different values and I have written the following code:
# Load libraries
require(mgcv)
require(msm)
library(dplyr)
library(grDevices)
library(survival)
library(survminer)
# Set working directory
dirname<-dirname(rstudioapi::getSourceEditorContext()$path)
setwd(dirname)
load("ggsurvplot_facet_error.rda")
fit_test <- survfit(
Surv(follow_up, as.numeric(status)) ~ (sex + variable1), data = data)
plot_test <- ggsurvplot_facet(fit_test,
data = data,
pval = TRUE,
conf.int = TRUE,
surv.median.line = "hv", # Specify median survival
break.time.by = 1,
facet.by = "sex",
ggtheme = theme_bw(), # Change ggplot2 theme
palette = "aaas",
legend = "bottom",
xlab = "Time (years)",
ylab = "Death probability",
panel.labs = list(sex_recoded=c("Male", "Female")),
legend.labs = c("A", "B")
)
plot_test
This code works great and generates the following plot:
However, when I try to convert this code into a function or a FOR loop, so that it applies the same code to variable1 and variable2, I always get an error with the color/palette part of the plotting step.
# Variables_with_2_categories: variable1 and variable2
two <- c("variable1", "variable2")
## TEST #1: USING A FUNCTION
fit_plot_function <- function(x) {
# FIT part of the function
two.i <- two[i]
fit_temp <- survfit(Surv(as.numeric(follow_up), as.numeric(status)) ~
sex + eval(as.name(paste0(two.i))), data = data)
# PLOT part of the function
plot_temp <- ggsurvplot_facet(fit_temp,
data = data,
pval = TRUE,
conf.int = TRUE,
surv.median.line = "hv", # Specify median survival
break.time.by = 1,
facet.by = "sex",
ggtheme = theme_bw(), # Change ggplot2 theme
palette = "aaas",
legend = "bottom",
xlab = "Time (years)",
ylab = "Death probability",
panel.labs = list(sex_recoded=c("Male", "Female")),
legend.labs = rep(c("A", "B"),2)
)
}
fit_plot_function(two)
# Warning message:
# Now, to change color palette, use the argument palette=
# 'eval(as.name(paste0(two.i)))' instead of color = 'eval(as.name(paste0(two.i)))'
print(plot_temp)
# Error in grDevices::col2rgb(colour, TRUE) :
# invalid color name 'eval(as.name(paste0(two.i)))'
It looks like when it evaluates the names of the variables that were parsed with a vector, it doesn't recognize the variable names. With a FOR loop it happens exactly the same:
## TEST #2: USING A FOR LOOP
n.two <- length(two)
for(i in 1:n.two) {
two.i <- two[i]
fit_temp <- survfit(Surv(as.numeric(follow_up), as.numeric(status)) ~
(sex + eval(as.name(paste0(two.i)))), data = data)
plot_temp <- ggsurvplot_facet(fit_temp,
data = data,
pval = TRUE,
conf.int = TRUE,
surv.median.line = "hv", # Specify median survival
break.time.by = 1,
facet.by = "sex",
ggtheme = theme_bw(), # Change ggplot2 theme
palette = "aaas",
legend = "bottom",
xlab = "Time (years)",
ylab = "Death probability",
panel.labs = list(sex_recoded=c("Male", "Female")),
legend.labs = rep(c("A", "B"),2)
)
}
print(plot_temp)
# ERROR: Now, to change color palette, use the argument palette= 'eval(as.name(paste0(two.i)))'
# instead of color = 'eval(as.name(paste0(two.i)))
Just as an additional comment, it would be great if I could apply the same code to the variables that have both, two or three different values at the same time, instead of having to apply a different function for each of them.
Thank you very much for your help,
Best Regards,
Yatrosin
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] survminer_0.4.3.999 ggpubr_0.2 magrittr_1.5 ggplot2_3.1.1 survival_2.44-1.1
[6] dplyr_0.8.0.1 msm_1.6.7 mgcv_1.8-27 nlme_3.1-137
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 pillar_1.3.1 compiler_3.5.1 plyr_1.8.4 tools_3.5.1 digest_0.6.18
[7] tibble_2.1.1 gtable_0.3.0 lattice_0.20-38 pkgconfig_2.0.2 rlang_0.3.4 Matrix_1.2-17
[13] ggsci_2.9 rstudioapi_0.10 cmprsk_2.2-7 yaml_2.2.0 mvtnorm_1.0-10 expm_0.999-4
[19] xfun_0.6 gridExtra_2.3 knitr_1.22 withr_2.1.2 survMisc_0.5.5 generics_0.0.2
[25] grid_3.5.1 tidyselect_0.2.5 data.table_1.12.2 glue_1.3.1 KMsurv_0.1-5 R6_2.4.0
[31] km.ci_0.5-2 purrr_0.3.2 tidyr_0.8.3 scales_1.0.0 backports_1.1.4 splines_3.5.1
[37] assertthat_0.2.1 xtable_1.8-3 colorspace_1.4-1 labeling_0.3 lazyeval_0.2.2 munsell_0.5.0
[43] broom_0.5.2 crayon_1.3.4 zoo_1.8-5
回答1:
It's time to purrrify. What you want can be done with purrr
. You can read about making ggplot2 purrr
here and more examples here.
First of all we need to transform your data to long format with tidyr::gather
. We'll keep everything in data frame as it was except variables1,2,3,4. They will be melt.
library(tidyr)
library(dplyr)
library(purrr)
data %>%
gather(num, variable, -sample_id, -sex,
-visit_number, -age_at_enrollment,
-follow_up, -status) %>%
mutate(num2 = num) %>% # We'll need this column later for the titles
as_tibble() -> long_data
# A tibble: 2,028 x 8
sample_id sex visit_number age_at_enrollment follow_up status num variable
<fct> <fct> <fct> <dbl> <dbl> <fct> <chr> <int>
1 sample_0001 Female 1 56.7 0 1 variable1 0
2 sample_0001 Female 2 57.7 0.920 1 variable1 0
3 sample_0001 Female 3 58.6 1.90 1 variable1 0
4 sample_0001 Female 4 59.7 2.97 2 variable1 0
5 sample_0001 Female 5 60.7 4.01 1 variable1 0
6 sample_0001 Female 6 61.7 4.99 1 variable1 0
7 sample_0002 Female 1 55.9 0 1 variable1 1
8 sample_0002 Female 2 56.9 1.04 1 variable1 1
9 sample_0002 Female 3 58.0 2.15 1 variable1 1
10 sample_0002 Female 4 59.0 3.08 1 variable1 1
# ... with 2,018 more rows
Now we need to transform our long dataframe to a nested dataframe and map
! Be accurate with ggsurvplot
— this function doesn't support tibbles
which are creating during nest()
.
long_data %>%
group_by(num) %>%
nest() %>%
mutate(
# Run survfit() for every variable
fit_f = map(data, ~survfit(Surv(follow_up, as.numeric(status)) ~ (sex + variable), data = .)),
# Create survplot for every variable and survfit
plots = map2(fit_f, data, ~ggsurvplot(.x,
as.data.frame(.y), # Important! convert from tibble to data.frame
pval = TRUE,
conf.int = TRUE,
facet.by = "sex",
surv.median.line = "hv",
break.time.by = 1,
ggtheme = theme_bw(),
palette = "aaas",
xlab = "Time (years)",
ylab = "Death probability") +
ggtitle(paste0("This is plot of ", .y$num2)) + # Add a title
theme(legend.position = "bottom"))) -> plots
Now you can return your plots by typing this:
plots$plots[[1]]
plots$plots[[2]]
plots$plots[[3]]
plots$plots[[4]] # plotted below
And save all your plots using map2()
map2(paste0(unique(long_data$num), ".pdf"), plots$plots, ggsave)
UPDATE
Unfortunately, I cannot figure out how to change legend labels. The only solution I can suggest is below. Remember that plots$plots[[…]]
is a ggplot
object, so you can change everything after. For example, to change legend labels I just need to add scale_fill_discrete
and scale_color_discrete
. The same can be done with the title, labs, theme etc.
library(ggsci) # to add aaas color palette
plots$plots[[3]] +
labs(title = "Variable 3",
subtitle = "You just have to be the best") +
ggsci::scale_color_aaas(guide = F) +
ggsci::scale_fill_aaas(label = LETTERS[1:3])
来源:https://stackoverflow.com/questions/55712484/ggsurvplot-facet-returns-error-in-grdevicescol2rgbcolour-true-invalid-c