I have a dataframe with a column of categorical data (two possible values) and multiple variable columns. I need to plot multiple box-plots, one for each variable column. Ea
I would sapply
over a vector of column numbers and subset mydata
to the column of interest within the function. By iterating over column numbers rather than columns themselves, you have easy access to the correct colname
to be added to the plot later.
You also need to add a small outer margin (oma
) to side 3 (top) so that the p value can be printed there for the first 3 plots.
To address your second question - that of reducing the y limits to fit the range of the data - this will be automatic if you specify outline=FALSE
to suppress plotting of outliers. (In your code, you simply supplied NA
as the plotting character to hide them, but the boxplots
still considered them part of the data when determining the axis limits.) However, by setting outline=FALSE
, the y limits that are calculated will not accommodate any outliers that would otherwise be plotted by the call to stripchart
(which I've now modified to points
since it's a bit simpler).
par(mfrow=c(3,3), mar=c(3, 3, 0.5, 0.5), mgp = c(1.5, 0.3, 0), tck = -0.01,
oma=c(0, 0, 1, 0))
sapply(seq_along(mydata)[-1], function(i) {
y <- mydata[, i]
boxplot(y ~ mydata$categ, outline=FALSE, ylab="VarLevel", tck = 1.0,
names=c("categ1","categ2"), las=1)
points(y ~ jitter(mydata$categ, 0.5),
col=ifelse(mydata$categ==1, 'firebrick', 'slateblue'))
test <- wilcox.test(y ~ mydata$categ)
pvalue <- test$p.value
pvalueformatted <- format(pvalue, digits=3, nsmall=2)
mtext(paste(colnames(mydata)[i], " p = ", pvalueformatted), side=3,
line=0.5, at=0.9, cex = 0.6)
})
Note I've also modified your mtext
call to plot on side 3 rather than specifying side 1 with a large negative margin.