[Update: Although I\'ve accepted an answer, please add another answer if you have additional visualization ideas (whether in R or another language/program). Tex
Here are a few ideas for plotting your data. I've used ggplot2, and I've reformatted the data a bit in places.
I've used a stacked barplot to mimic your mosaic plot and solve the alignment issue.
Data points for each student are connected by a gray line, making this reminiscent of a parallel coordinates plot. Coloring the points shows the categorical standing. Using GPA on the y-axis helps spread out the points to reduce overplotting, and shows correlation of standing and GPA. A major problem is that many valid standing
datapoints drop out because they lack a matching termGPA value.
Here I've created a new variable called initial_standing to use for facetting. Each panel contains students who match in both cohort and initial_standing. Plotting the id as text makes this figure a bit cluttered, but could be useful in some cases.
This plot is like a heatmap where each row is a student. I controlled the order of the id
axis to force initial_standing and cohort groupings to stay together. If you have many more rows, you may want to consider sorting rows by some type of clustering.
library(ggplot2)
# Create new data frame for determining initial standing.
standing_data = data.frame(id=unique(df1$id), initial_standing=NA, cohort=NA)
for (i in 1:nrow(standing_data)) {
id = standing_data$id[i]
subdat = df1[df1$id == id, ]
subdat = subdat[complete.cases(subdat), ]
initial_standing = subdat$standing[which.min(subdat$term)]
standing_data[i, "initial_standing"] = as.character(initial_standing)
standing_data[i, "cohort"] = as.character(subdat$cohort[1])
}
standing_data$cohort = factor(standing_data$cohort, levels=levels(df1$cohort))
standing_data$initial_standing = factor(standing_data$initial_standing,
levels=levels(df1$standing))
# Add the new column (initial_standing) to df1.
df1 = merge(df1, standing_data[, c("id", "initial_standing")], by="id")
# Remove rows where standing is missing. Make some plots tidier.
df1 = df1[!is.na(df1$standing), ]
# Create id factor, controlling the sort order of the levels.
id_order = order(standing_data$initial_standing, standing_data$cohort)
df1$id = factor(df1$id, levels=as.character(standing_data$id)[id_order])
p1 = ggplot(df1, aes(x=term, fill=standing)) +
geom_bar(position="fill", colour="grey20", size=0.5, width=1.0) +
facet_grid(cohort ~ .) +
scale_fill_brewer(palette="Set1")
p2 = ggplot(df1, aes(x=term, y=termGPA, group=id)) +
geom_line(colour="grey70") +
geom_point(aes(colour=standing), size=4) +
facet_grid(cohort ~ .) +
scale_colour_brewer(palette="Set1")
p3 = ggplot(df1, aes(x=term, y=termGPA, group=id)) +
geom_line(colour="grey70") +
geom_point(aes(colour=standing), size=4) +
geom_text(aes(label=id), hjust=-0.30, size=3) +
facet_grid(initial_standing ~ cohort) +
scale_colour_brewer(palette="Set1")
p4 = ggplot(df1, aes(x=term, y=id, fill=standing)) +
geom_tile(colour="grey20") +
facet_grid(initial_standing ~ ., space="free_y", scales="free_y") +
scale_fill_brewer(palette="Set1") +
opts(panel.grid.major=theme_blank()) +
opts(panel.grid.minor=theme_blank())
ggsave("plot_1.png", p1, width=10, height=6.25, dpi=80)
ggsave("plot_2.png", p2, width=10, height=6.25, dpi=80)
ggsave("plot_3.png", p3, width=10, height=6.25, dpi=80)
ggsave("plot_4.png", p4, width=10, height=6.25, dpi=80)