qqnorm and qqline in ggplot2

后端 未结 8 1707
隐瞒了意图╮
隐瞒了意图╮ 2020-12-04 06:37

Say have a linear model LM that I want a qq plot of the residuals. Normally I would use the R base graphics:

qqnorm(residuals(LM), ylab=\"Residuals\")
qqline         


        
相关标签:
8条回答
  • 2020-12-04 07:13

    ggplot2 v.3.0.0 now has an qqline stat. From the help page:

    df <- data.frame(y = rt(200, df = 5))
    p <- ggplot(df, aes(sample = y))
    p + stat_qq() + stat_qq_line()
    

    !ggplot2 v3.0.0 Example stats equivalent to qqnorm plus abline]1

    0 讨论(0)
  • 2020-12-04 07:15

    Since version 3.0, a stat_qq_line equivalent to the below has found its way into the official ggplot2 code.


    Old version:

    Since version 2.0, ggplot2 has a well-documented interface for extension; so we can now easily write a new stat for the qqline by itself (which I've done for the first time, so improvements are welcome):

    qq.line <- function(data, qf, na.rm) {
        # from stackoverflow.com/a/4357932/1346276
        q.sample <- quantile(data, c(0.25, 0.75), na.rm = na.rm)
        q.theory <- qf(c(0.25, 0.75))
        slope <- diff(q.sample) / diff(q.theory)
        intercept <- q.sample[1] - slope * q.theory[1]
    
        list(slope = slope, intercept = intercept)
    }
    
    StatQQLine <- ggproto("StatQQLine", Stat,
        # http://docs.ggplot2.org/current/vignettes/extending-ggplot2.html
        # https://github.com/hadley/ggplot2/blob/master/R/stat-qq.r
        
        required_aes = c('sample'),
        
        compute_group = function(data, scales,
                                 distribution = stats::qnorm,
                                 dparams = list(),
                                 na.rm = FALSE) {
            qf <- function(p) do.call(distribution, c(list(p = p), dparams))
            
            n <- length(data$sample)
            theoretical <- qf(stats::ppoints(n))
            qq <- qq.line(data$sample, qf = qf, na.rm = na.rm)
            line <- qq$intercept + theoretical * qq$slope
    
            data.frame(x = theoretical, y = line)
        } 
    )
    
    stat_qqline <- function(mapping = NULL, data = NULL, geom = "line",
                            position = "identity", ...,
                            distribution = stats::qnorm,
                            dparams = list(),
                            na.rm = FALSE,
                            show.legend = NA, 
                            inherit.aes = TRUE) {
        layer(stat = StatQQLine, data = data, mapping = mapping, geom = geom,
              position = position, show.legend = show.legend, inherit.aes = inherit.aes,
              params = list(distribution = distribution,
                            dparams = dparams,
                            na.rm = na.rm, ...))
    }
    

    This also generalizes over the distribution (exactly like stat_qq does), and can be used as follows:

    > test.data <- data.frame(sample=rnorm(100, 10, 2)) # normal distribution
    > test.data.2 <- data.frame(sample=rt(100, df=2))   # t distribution
    > ggplot(test.data, aes(sample=sample)) + stat_qq() + stat_qqline()
    > ggplot(test.data.2, aes(sample=sample)) + stat_qq(distribution=qt, dparams=list(df=2)) +
    +   stat_qqline(distribution=qt, dparams=list(df=2))
    

    (Unfortunately, since the qqline is on a separate layer, I couldn't find a way to "reuse" the distribution parameters, but that should only be a minor problem.)

    0 讨论(0)
提交回复
热议问题