Say have a linear model LM that I want a qq plot of the residuals. Normally I would use the R base graphics:
qqnorm(residuals(LM), ylab=\"Residuals\")
qqline
ggplot2 v.3.0.0 now has an qqline stat. From the help page:
df <- data.frame(y = rt(200, df = 5))
p <- ggplot(df, aes(sample = y))
p + stat_qq() + stat_qq_line()
!ggplot2 v3.0.0 Example stats equivalent to qqnorm plus abline]1
Since version 3.0, a stat_qq_line
equivalent to the below has found its way into the official ggplot2 code.
Old version:
Since version 2.0, ggplot2 has a well-documented interface for extension; so we can now easily write a new stat for the qqline by itself (which I've done for the first time, so improvements are welcome):
qq.line <- function(data, qf, na.rm) {
# from stackoverflow.com/a/4357932/1346276
q.sample <- quantile(data, c(0.25, 0.75), na.rm = na.rm)
q.theory <- qf(c(0.25, 0.75))
slope <- diff(q.sample) / diff(q.theory)
intercept <- q.sample[1] - slope * q.theory[1]
list(slope = slope, intercept = intercept)
}
StatQQLine <- ggproto("StatQQLine", Stat,
# http://docs.ggplot2.org/current/vignettes/extending-ggplot2.html
# https://github.com/hadley/ggplot2/blob/master/R/stat-qq.r
required_aes = c('sample'),
compute_group = function(data, scales,
distribution = stats::qnorm,
dparams = list(),
na.rm = FALSE) {
qf <- function(p) do.call(distribution, c(list(p = p), dparams))
n <- length(data$sample)
theoretical <- qf(stats::ppoints(n))
qq <- qq.line(data$sample, qf = qf, na.rm = na.rm)
line <- qq$intercept + theoretical * qq$slope
data.frame(x = theoretical, y = line)
}
)
stat_qqline <- function(mapping = NULL, data = NULL, geom = "line",
position = "identity", ...,
distribution = stats::qnorm,
dparams = list(),
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(stat = StatQQLine, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(distribution = distribution,
dparams = dparams,
na.rm = na.rm, ...))
}
This also generalizes over the distribution (exactly like stat_qq
does), and can be used as follows:
> test.data <- data.frame(sample=rnorm(100, 10, 2)) # normal distribution
> test.data.2 <- data.frame(sample=rt(100, df=2)) # t distribution
> ggplot(test.data, aes(sample=sample)) + stat_qq() + stat_qqline()
> ggplot(test.data.2, aes(sample=sample)) + stat_qq(distribution=qt, dparams=list(df=2)) +
+ stat_qqline(distribution=qt, dparams=list(df=2))
(Unfortunately, since the qqline is on a separate layer, I couldn't find a way to "reuse" the distribution parameters, but that should only be a minor problem.)