plotting the means with confidence intervals with ggplot

前端 未结 1 965

I have some data that I have gathered from a model. I want to plot the size of a population over time. I have the population size at each time step, and 100 replicates. I would

1条回答
  •  醉话见心
    2021-02-08 13:41

    Since you have replicated data, and you want to plot mean/CL, you are probably better off using stat_summary(...) which is designed for (you guessed it) summarizing data. Basically, it applies a function to all the y-values for each x-value (so, the mean(...) function for example), and then plots the result using whatever geometry you specify. Here's an example:

    # sample data - should be provided in question
    set.seed(1)      # for reproducible example
    time <- 1:25
    df   <- data.frame(time,
                       pop=rnorm(100*length(time), mean=10*time/(25+time)))
    
    library(ggplot2)
    ggplot(df, aes(x=time, y=pop))+ 
      stat_summary(geom="ribbon", fun.data=mean_cl_normal, width=0.1, conf.int=0.95, fill="lightblue")+
      stat_summary(geom="line", fun.y=mean, linetype="dashed")+
      stat_summary(geom="point", fun.y=mean, color="red")
    

    So here we have 3 layers: a layer that summarizes the y-values using the mean(...) function, and plots using geom="line", a layer that summarizes the same way but plots using geom="point", and a layer that uses geom="ribbon" This geom requires ymin and ymax aesthetics, so we use the built-in ggplot function mean_cl_normal to generate those based on the assumption that the error is normally distributed and that, therefore, the means follow a t-distribution. Type ?hmisc for documentation on the various functions that are useful for confidence limits. The layers render in the order of the code, so, since you want shading, we need to put the error ribbon first.

    Finally, it is of course possible to summarize the data yourself, using dplyr or some such, but I don't really see the point of doing that.

    Update (based on recent comment): Looks like the most recent version of ggplot2 (2.0.0) has a different way of specifying the arguments to fun.data. This works in the new version:

    ggplot(df, aes(x=time, y=pop))+ 
        stat_summary(geom="ribbon", fun.data=mean_cl_normal, 
                     fun.args=list(conf.int=0.95), fill="lightblue")+
        stat_summary(geom="line", fun.y=mean, linetype="dashed")+
        stat_summary(geom="point", fun.y=mean, color="red")
    

    The problem with the width=... argument is a bit more subtle I think: it actually isn't needed (in the original answer I used error bars, and forgot to remove this argument when I changed it to ribbon). The older version of ggplot2 ignored extraneous arguments (hence, no error). The new version, evidently, is more strict. Probably this is better.

    0 讨论(0)
提交回复
热议问题