Is there a way to create a boxplot in R that will display with the box (somewhere) an \"N=(sample size)\"? The varwidth logical adjusts the width of the box on the basis of sam
Here's some ggplot2 code. It's going to display the sample size at the sample mean, making the label multifunctional!
First, a simple function for fun.data
give.n <- function(x){
return(c(y = mean(x), label = length(x)))
}
Now, to demonstrate with the diamonds data
ggplot(diamonds, aes(cut, price)) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text")
You may have to play with the text size to make it look good, but now you have a label for the sample size which also gives a sense of the skew.
I figured out a workaround using the Envstats package. This package needs to be downloaded, loaded and activated using:
library(Envstats)
The stripChart (different from stripchart) does add to the chart some values such as the n values. First I plotted my boxplot. Then I used the add=T in the stripChart. Obviously, many things were hidden in the stripChart code so that they do not show up on the boxplot. Here is the code I used for the stripChart to hide most items.
Boxplot with integrated stripChart to show n values:
stripChart(data.frame(T0_G1,T24h_G1,T96h_G1,T7d_G1,T11d_G1,T15d_G1,T30d_G1), show.ci=F,axes=F,points.cex=0,n.text.line=1.6,n.text.cex=0.7,add=T,location.scale.text="none")
So boxplot
boxplot(data.frame(T0_G1,T24h_G1,T96h_G1,T7d_G1,T11d_G1,T15d_G1,T30d_G1),main="All Rheometry Tests on Egg Plasma at All Time Points at 0.1Hz,0.1% and 37 Set 1,2,3", names=c("0h","24h","96h","7d ", "11d", "15d", "30d"),boxwex=0.6,par(mar=c(8,4,4,2)))
Then stripChart
stripChart(data.frame(T0_G1,T24h_G1,T96h_G1,T7d_G1,T11d_G1,T15d_G1,T30d_G1), show.ci=F,axes=F,points.cex=0,n.text.line=1.6,n.text.cex=0.7,add=T,location.scale.text="none")
You can always adjust the high of the numbers (n values) so that they fit where you want.
To get the n
on top of the bar, you could use text
with the stat
details provided by boxplot as follows
b <- boxplot(xvar ~ f1, data=frame, plot=0)
text(1:length(b$n), b$stats[5,]+1, paste("n=", b$n))
The stats field of b is a matrix, each column contains the extreme of the lower whisker, the lower hinge, the median, the upper hinge and the extreme of the upper whisker for one group/plot.
The gplots package provides boxplot.n
, which according to the documentation produces a boxplot annotated with the number of observations.
You can use the names
parameter to write the n
next to each factor name.
If you don't want to calculate the n
yourself you could use this little trick:
# Do the boxplot but do not show it
b <- boxplot(xvar ~ f1, data=frame, plot=0)
# Now b$n holds the counts for each factor, we're going to write them in names
boxplot(xvar ~ f1, data=frame, xlab="input values", names=paste(b$names, "(n=", b$n, ")"))