R - reformat P value in ggplot using 'stat_compare_means'

流过昼夜 提交于 2021-02-18 19:00:28

问题


I want to plot the p values to each panel in a faceted ggplot. If the p value is larger than 0.05, I want to display the p value as it is. If the p value is smaller than 0.05, I want to display the value in scientific notation (i.e, 0.0032 -> 3.20e-3; 0.0000425 -> 4.25e-5).

The code I wrote to do this is:

   p1 <- ggplot(data = CD3, aes(location, value, color = factor(location),
                             fill = factor(location))) + 
  theme_bw(base_rect_size = 1) +
  geom_boxplot(alpha = 0.3, size = 1.5, show.legend = FALSE) +
  geom_jitter(width = 0.2, size = 2, show.legend = FALSE) +
  scale_color_manual(values=c("#4cdee6", "#e47267", "#13ec87")) +
  scale_fill_manual(values=c("#4cdee6", "#e47267", "#13ec87")) +
  ylab(expression(paste("Density of clusters, ", mm^{-2}))) +
  xlab(NULL) +
  stat_compare_means(comparisons = list(c("CT", 'N'), c("IF","N")), 
                     aes(label = ifelse(..p.format.. < 0.05, formatC(..p.format.., format = "e", digits = 2),
                                        ..p.format..)), 
                     method = 'wilcox.test', show.legend = FALSE, size = 10) +
  #ylab(expression(paste('Density, /', mm^2, )))+
  theme(axis.text = element_text(size = 10), 
        axis.title = element_text(size = 20), 
        legend.text = element_text(size = 38), 
        legend.title = element_text(size = 40), 
        strip.background = element_rect(colour="black", fill="white", size = 2),
        strip.text = element_text(margin = margin(10, 10, 10, 10), size = 40),
        panel.grid = element_line(size = 1.5))
plot(p1)

This code runs without error, however, the format of numbers isn't changed. What am I doing wrong? I attached the data to reproduce the plot: donwload data here

EDIT

structure(list(value = c(0.931966449207829, 3.24210526315789, 
3.88811650210901, 0.626860993574675, 4.62085308056872, 0.477508650519031, 
0.111900110501359, 3.2495164410058, 4.06626506024096, 0.21684918139434, 
1.10365086026018, 4.66666666666667, 0.174109967855698, 0.597625869832174, 
2.3758865248227, 0.360751947840548, 1.00441501103753, 3.65168539325843
), Criteria = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Density", "Density of cluster", 
"nodular count", "Elongated count"), class = "factor"), Case = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 
6L), .Label = c("Case 1A", "Case 1B", "Case 2", "Case 3", "Case 4", 
"Case 5"), class = "factor"), Mark = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("CD3", 
"CD4", "CD8", "CD20", "FoxP3"), class = "factor"), location = structure(c(3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L), .Label = c("CT", "IF", "N"), class = "factor")), row.names = c(91L, 
92L, 93L, 106L, 107L, 108L, 121L, 122L, 123L, 136L, 137L, 138L, 
151L, 152L, 153L, 166L, 167L, 168L), class = "data.frame")

回答1:


I think your issue came from the stat_compare_means and the use of comparisons. I'm not totally sure, but I will guess that the output of p value for stat_compare_means is different from compare_means and so, you can't use it for the aes of label.

Let me explain, with your example, you can modify the display of the p.value like this:

library(ggplot2)
library(ggpubr)
ggplot(df, aes(x = location, y = value, color = location))+
  geom_boxplot()+
  stat_compare_means(ref.group = "N", aes(label = ifelse(p < 0.05,sprintf("p = %2.1e", as.numeric(..p.format..)), ..p.format..)))

You get the correct display of p.value but you lost your bars. So, if you use comparisons argument, you get:

library(ggplot2)
library(ggpubr)
ggplot(df, aes(x = location, y = value, color = location))+
    geom_boxplot()+
    stat_compare_means(comparisons = list(c("CT","N"), c("IF","N")), aes(label = ifelse(p < 0.05,sprintf("p = %2.1e", as.numeric(..p.format..)), ..p.format..)))

So, now, you get bars but not the correct display.

To circumwent this issue, you can perform the statistics outside of ggplot2 using compare_means functions and use the package ggsignif to display the correct display.

Here, I'm using dplyr and the function mutate to create new columns, but you can do it easily in base R.

library(dplyr)
library(magrittr)
c <- compare_means(value~location, data = df, ref.group = "N")
c %<>% mutate(y_pos = c(5,5.5), labels = ifelse(p < 0.05, sprintf("%2.1e",p),p))

# A tibble: 2 x 10
  .y.   group1 group2       p p.adj p.format p.signif method   y_pos labels 
  <chr> <chr>  <chr>    <dbl> <dbl> <chr>    <chr>    <chr>    <dbl> <chr>  
1 value N      CT     0.00866 0.017 0.0087   **       Wilcoxon   5   8.7e-03
2 value N      IF     0.00866 0.017 0.0087   **       Wilcoxon   5.5 8.7e-03

Then, you can plot it:

library(ggplot2)
library(ggpubr)
library(ggsignif)
ggplot(df, aes(x = location, y = value))+
  geom_boxplot(aes(colour = location))+
  ylim(0,6)+
  geom_signif(data = as.data.frame(c), aes(xmin=group1, xmax=group2, annotations=labels, y_position=y_pos),
                manual = TRUE)

Does it look what you are trying to plot ?



来源:https://stackoverflow.com/questions/59494698/r-reformat-p-value-in-ggplot-using-stat-compare-means

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!