First of all, I\'m still a beginner. I\'m trying to interpret and draw a stack bar plot with R. I already took a look at a number of answers but some were not specific to my
I'm basically answering a different question. I suppose this can be seen as perversity on my part, but I really dislike barplots of pretty much any sort. They have always seemed to create wasted space because the present informationed numerical values are less useful that an appropriately constructed table. The vcd
package offers an extended mosaicplot function that seems to me to be more accurately called a "multivariate barplot that any of the ones I have seen so far. It does require that you first construct a contingency table for which the xtabs
function seems a perfect fit.
install.packages)"vcd")
library(vcd)
help(package=vcd,mosaic)
col=c("paleturquoise3", "palegreen3")
vcd::mosaic(xtabs(~Variant+Region + PrecededByPrep + Time, data=ttt)
,highlighting="Variant", highlighting_fill=col)
That was the 5 way plot and this is the 5-way plot:
png(); vcd::mosaic( xtabs(
~Variant+Region + PrecededByPrep + Person + Time,
data=ttt)
,highlighting="Variant", highlighting_fill=col); dev.off()
Here is my proposition for a solution with function barplot
of base R :
1. calculate the counts
l_count_df<-lapply(colnames(t)[-1],function(nomcol){table(t$Variant,t[,nomcol])})
count_df<-l_count_df[[1]]
for (i in 2:length(l_count_df)){
count_df<-cbind(count_df,l_count_df[[i]])
}
2. draw the barplot without axis names, saving the bar coordinates
par(las=1,col.axis="#404040",mar=c(5,4.5,4,2),mgp=c(3.5,1,0))
bp<-barplot(count_df,width=1.2,space=rep(c(1,0.3),4),col=c("paleturquoise3", "palegreen3"),border="#404040", axisname=F, ylab="Frequency",
legend=row.names(count_df),ylim=c(0,max(colSums(count_df))*1.2))
3. label the bars
mtext(side=1,line=0.8,at=bp,text=colnames(count_df))
mtext(side=1,line=2,at=(bp[seq(1,8,by=2)]+bp[seq(2,8,by=2)])/2,text=colnames(t)[-1],font=2)
4. add values inside the bars
for(i in 1:ncol(count_df)){
val_elke<-count_df[1,i]
val_iedere<-count_df[2,i]
text(bp[i],val_elke/2,val_elke)
text(bp[i],val_elke+val_iedere/2,val_iedere)
}
Here is what I get (with my random data) :
Here is one possibility which starts with the 'un-tabulated' data frame, melt
it, plot it with geom_bar
in ggplot2
(which does the counting per group), separate the plot by variable by using facet_wrap
.
Create toy data:
set.seed(123)
df <- data.frame(Variant = sample(c("iedere", "elke"), size = 50, replace = TRUE),
Region = sample(c("VL", "NL"), size = 50, replace = TRUE),
PrecededByPrep = sample(c("1", "0"), size = 50, replace = TRUE),
Person = sample(c("person", "no person"), size = 50, replace = TRUE),
Time = sample(c("time", "no time"), size = 50, replace = TRUE))
Reshape data:
library(reshape2)
df2 <- melt(df, id.vars = "Variant")
Plot:
library(ggplot2)
ggplot(data = df2, aes(factor(value), fill = Variant)) +
geom_bar() +
facet_wrap(~variable, nrow = 1, scales = "free_x") +
scale_fill_grey(start = 0.5) +
theme_bw()
There are lots of opportunities to customize the plot, such as setting order of factor levels, rotating axis labels, wrapping facet labels on two lines (e.g. for the longer variable name "PrecededByPrep"), or changing spacing between facets.
Customization (following updates in question and comments by OP)
# labeller function used in facet_grid to wrap "PrecededByPrep" on two lines
# see http://www.cookbook-r.com/Graphs/Facets_%28ggplot2%29/#modifying-facet-label-text
my_lab <- function(var, value){
value <- as.character(value)
if (var == "variable") {
ifelse(value == "PrecededByPrep", "Preceded\nByPrep", value)
}
}
ggplot(data = df2, aes(factor(value), fill = Variant)) +
geom_bar() +
facet_grid(~variable, scales = "free_x", labeller = my_lab) +
scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
theme_bw() +
theme(axis.text = element_text(face = "bold"), # axis tick labels bold
axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
line = element_line(colour = "gray25"), # line colour gray25 = #404040
strip.text = element_text(face = "bold")) + # facet labels bold
xlab("factors") + # set axis labels
ylab("frequency")
Add counts to each bar (edit following comments from OP).
The basic principles to calculate the y coordinates can be found in this Q&A. Here I use dplyr
to calculate counts per bar (i.e. label
in geom_text
) and their y
coordinates, but this could of course be done in base
R, plyr
or data.table
.
# calculate counts (i.e. labels for geom_text) and their y positions.
library(dplyr)
df3 <- df2 %>%
group_by(variable, value, Variant) %>%
summarise(n = n()) %>%
mutate(y = cumsum(n) - (0.5 * n))
# plot
ggplot(data = df2, aes(x = factor(value), fill = Variant)) +
geom_bar() +
geom_text(data = df3, aes(y = y, label = n)) +
facet_grid(~variable, scales = "free_x", labeller = my_lab) +
scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
theme_bw() +
theme(axis.text = element_text(face = "bold"), # axis tick labels bold
axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
line = element_line(colour = "gray25"), # line colour gray25 = #404040
strip.text = element_text(face = "bold")) + # facet labels bold
xlab("factors") + # set axis labels
ylab("frequency")