Extract slope of multiple trend lines from geom_smooth()

后端 未结 1 642
孤城傲影
孤城傲影 2021-01-18 10:17

I am trying to plot multiple trend lines (every ten years) in a time series using ggplot.

Here\'s the data:

dat <- structure(list(YY = 1961:2010,          


        
1条回答
  •  粉色の甜心
    2021-01-18 10:30

    So, each of these tasks are best handled before you pipe your data into ggplot2, but they are all made fairly easy using some of the other packages from the tidyverse.

    Beginning with questions 1 and 2:

    While ggplot2 can plot the regression line, to extract the estimated slope coefficients you need to work with the lm() object explicitly. Using group_by() and mutate(), you can add a grouping variable (my code below does this for 5 year groups just for example) and then calculate and extract just the slope estimate into columns in your existing data frame. Then those slope estimates can be plotted in ggplot using the geom_text() call. I've done this below in a quick and dirty way (placing each label at the mean of the x and y values they regress) but you can specify their exact placement in your dataframe.

    Grouping variables and data prep makes question 2 a breeze too: now that you have the grouping variables explicitly in your dataframe there is no need to plot one by one, geom_smooth() accepts the group aesthetic.

    Additionally, to answer question 3, you can extract the pvalue from the summary of your lm objects and filter out only those that are significant to the level you care about. If you pass this now complete dataframe to geom_smooth() and geom_text() you will get the plot you're looking for!

    library(tidyverse)
    
     # set up our base plot
     p <- ggplot(dat, aes(x = YY, y = a)) +
      geom_line(colour = "blue", lwd = 1) +
      geom_point(colour = "blue", size = 2) +
      theme(
        panel.background = element_rect(fill = "white"),
        plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm"),
        panel.border = element_rect(colour = "black", fill = NA, size = 1),
        axis.line.x = element_line(colour = "black"),
        axis.line.y = element_line(colour = "black"),
        axis.text = element_text(size = 15, colour = "black", family = "serif"),
        axis.title = element_text(size = 15, colour = "black", family = "serif"),
        legend.position = "top"
      ) +
      scale_x_discrete(limits = c(seq(1961, 2010, 5)), expand = c(0, 0))
    
    # add a grouping variable (or many!)
     prep5 <- dat %>%
      mutate(group5 = rep(1:10, each = 5)) %>%
      group_by(group5) %>%
      mutate(
        slope = round(lm(YY ~ a)$coefficients[2], 2),
        significance = summary(lm(YY ~ a))$coefficients[2, 4],
        x = mean(YY),   # x coordinate for slope label
        y = mean(a)     # y coordinate for slope label
      ) %>%
      filter(significance < .2)   # only keep those with a pvalue < .2 
    
    p + geom_smooth(
      data = prep5, aes(x = YY, y = a, group = group5),  # grouping variable does the plots for us!
      method = "lm", se = FALSE, color = "black",
      formula = y ~ x, linetype = "dashed"
    ) +
      geom_text(
        data = prep5, aes(x = x, y = y, label = slope),
        nudge_y = 12, nudge_x = -1
      )
    

    Now you may want to be a little more careful about specifying the location of your text labels than I have been here. I used means and the nudge_* arguments of geom_text() to do a quick example but keep in mind since these values are mapped explicitly to x and y coordinates, you have complete control!

    Created on 2018-07-16 by the reprex package (v0.2.0).

    0 讨论(0)
提交回复
热议问题