qqnorm and qqline in ggplot2

后端 未结 8 1708
隐瞒了意图╮
隐瞒了意图╮ 2020-12-04 06:37

Say have a linear model LM that I want a qq plot of the residuals. Normally I would use the R base graphics:

qqnorm(residuals(LM), ylab=\"Residuals\")
qqline         


        
相关标签:
8条回答
  • 2020-12-04 06:48

    You can also add confidence Intervals/confidence bands with this function (Parts of the code copied from car:::qqPlot)

    gg_qq <- function(x, distribution = "norm", ..., line.estimate = NULL, conf = 0.95,
                      labels = names(x)){
      q.function <- eval(parse(text = paste0("q", distribution)))
      d.function <- eval(parse(text = paste0("d", distribution)))
      x <- na.omit(x)
      ord <- order(x)
      n <- length(x)
      P <- ppoints(length(x))
      df <- data.frame(ord.x = x[ord], z = q.function(P, ...))
    
      if(is.null(line.estimate)){
        Q.x <- quantile(df$ord.x, c(0.25, 0.75))
        Q.z <- q.function(c(0.25, 0.75), ...)
        b <- diff(Q.x)/diff(Q.z)
        coef <- c(Q.x[1] - b * Q.z[1], b)
      } else {
        coef <- coef(line.estimate(ord.x ~ z))
      }
    
      zz <- qnorm(1 - (1 - conf)/2)
      SE <- (coef[2]/d.function(df$z)) * sqrt(P * (1 - P)/n)
      fit.value <- coef[1] + coef[2] * df$z
      df$upper <- fit.value + zz * SE
      df$lower <- fit.value - zz * SE
    
      if(!is.null(labels)){ 
        df$label <- ifelse(df$ord.x > df$upper | df$ord.x < df$lower, labels[ord],"")
        }
    
      p <- ggplot(df, aes(x=z, y=ord.x)) +
        geom_point() + 
        geom_abline(intercept = coef[1], slope = coef[2]) +
        geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.2) 
      if(!is.null(labels)) p <- p + geom_text( aes(label = label))
      print(p)
      coef
    }
    

    Example:

    Animals2 <- data(Animals2, package = "robustbase")
    mod.lm <- lm(log(Animals2$brain) ~ log(Animals2$body))
    x <- rstudent(mod.lm)
    gg_qq(x)
    

    enter image description here

    0 讨论(0)
  • 2020-12-04 06:49

    The standard Q-Q diagnostic for linear models plots quantiles of the standardized residuals vs. theoretical quantiles of N(0,1). @Peter's ggQQ function plots the residuals. The snippet below amends that and adds a few cosmetic changes to make the plot more like what one would get from plot(lm(...)).

    ggQQ = function(lm) {
      # extract standardized residuals from the fit
      d <- data.frame(std.resid = rstandard(lm))
      # calculate 1Q/4Q line
      y <- quantile(d$std.resid[!is.na(d$std.resid)], c(0.25, 0.75))
      x <- qnorm(c(0.25, 0.75))
      slope <- diff(y)/diff(x)
      int <- y[1L] - slope * x[1L]
    
      p <- ggplot(data=d, aes(sample=std.resid)) +
        stat_qq(shape=1, size=3) +           # open circles
        labs(title="Normal Q-Q",             # plot title
             x="Theoretical Quantiles",      # x-axis label
             y="Standardized Residuals") +   # y-axis label
        geom_abline(slope = slope, intercept = int, linetype="dashed")  # dashed reference line
      return(p)
    }
    

    Example of use:

    # sample data (y = x + N(0,1), x in [1,100])
    df <- data.frame(cbind(x=c(1:100),y=c(1:100+rnorm(100))))
    ggQQ(lm(y~x,data=df))
    
    0 讨论(0)
  • 2020-12-04 06:53

    The following code will give you the plot you want. The ggplot package doesn't seem to contain code for calculating the parameters of the qqline, so I don't know if it's possible to achieve such a plot in a (comprehensible) one-liner.

    qqplot.data <- function (vec) # argument: vector of numbers
    {
      # following four lines from base R's qqline()
      y <- quantile(vec[!is.na(vec)], c(0.25, 0.75))
      x <- qnorm(c(0.25, 0.75))
      slope <- diff(y)/diff(x)
      int <- y[1L] - slope * x[1L]
    
      d <- data.frame(resids = vec)
    
      ggplot(d, aes(sample = resids)) + stat_qq() + geom_abline(slope = slope, intercept = int)
    
    }
    
    0 讨论(0)
  • 2020-12-04 06:57

    You could steal a page from the old-timers who did this stuff with normal probability paper. A careful look at a ggplot()+stat_qq() graphic suggests that a reference line can be added with geom_abline(), like this

    df <- data.frame( y=rpois(100, 4) )
    
    ggplot(df, aes(sample=y)) +
      stat_qq() +
      geom_abline(intercept=mean(df$y), slope = sd(df$y))
    
    0 讨论(0)
  • 2020-12-04 07:10

    Why not the following?

    Given some vector, say,

    myresiduals <- rnorm(100) ^ 2
    
    ggplot(data=as.data.frame(qqnorm( myresiduals , plot=F)), mapping=aes(x=x, y=y)) + 
        geom_point() + geom_smooth(method="lm", se=FALSE)
    

    But it seems strange that we have to use a traditional graphics function to prop up ggplot2.

    Can't we get the same effect somehow by starting with the vector for which we want the quantile plot and then applying the appropriate "stat" and "geom" functions in ggplot2?

    Does Hadley Wickham monitor these posts? Maybe he can show us a better way.

    0 讨论(0)
  • 2020-12-04 07:12

    With the latest ggplot2 version (>=3.0), new function stat_qq_line is implemented (https://github.com/tidyverse/ggplot2/blob/master/NEWS.md) and a qq line for model residuals can be added with:

    library(ggplot2)
    model <- lm(mpg ~ wt, data=mtcars)
    ggplot(model, aes(sample = rstandard(model))) + geom_qq() + stat_qq_line()
    

    rstandard(model) is needed to get the standardized residual. (credit @jlhoward and @qwr)

    If you get an 'Error in stat_qq_line() : could not find function "stat_qq_line"', your ggplot2 version is too old and you can fix it by upgrading your ggplot2 package: install.packages("ggplot2") .

    0 讨论(0)
提交回复
热议问题