Good Ways to Visualize Longitudinal Categorical Data in R

后端未结

关注

 3  1736

爱一瞬间的悲伤 2021-01-29 23:42

[Update: Although I\'ve accepted an answer, please add another answer if you have additional visualization ideas (whether in R or another language/program). Tex

3条回答

失恋的感觉 (楼主)

2021-01-30 00:15

Here are a few ideas for plotting your data. I've used ggplot2, and I've reformatted the data a bit in places.

Figure 1

enter image description here I've used a stacked barplot to mimic your mosaic plot and solve the alignment issue.

Figure 2

enter image description here Data points for each student are connected by a gray line, making this reminiscent of a parallel coordinates plot. Coloring the points shows the categorical standing. Using GPA on the y-axis helps spread out the points to reduce overplotting, and shows correlation of standing and GPA. A major problem is that many valid standing datapoints drop out because they lack a matching termGPA value.

Figure 3

enter image description here Here I've created a new variable called initial_standing to use for facetting. Each panel contains students who match in both cohort and initial_standing. Plotting the id as text makes this figure a bit cluttered, but could be useful in some cases.

Figure 4

enter image description here This plot is like a heatmap where each row is a student. I controlled the order of the id axis to force initial_standing and cohort groupings to stay together. If you have many more rows, you may want to consider sorting rows by some type of clustering.

library(ggplot2)

# Create new data frame for determining initial standing.
standing_data = data.frame(id=unique(df1$id), initial_standing=NA, cohort=NA)

for (i in 1:nrow(standing_data)) {
    id = standing_data$id[i]
    subdat = df1[df1$id == id, ]
    subdat = subdat[complete.cases(subdat), ]
    initial_standing = subdat$standing[which.min(subdat$term)]
    standing_data[i, "initial_standing"] = as.character(initial_standing)
    standing_data[i, "cohort"] = as.character(subdat$cohort[1])
}

standing_data$cohort = factor(standing_data$cohort, levels=levels(df1$cohort))
standing_data$initial_standing = factor(standing_data$initial_standing,
                                        levels=levels(df1$standing))

# Add the new column (initial_standing) to df1.
df1 = merge(df1, standing_data[, c("id", "initial_standing")], by="id")

# Remove rows where standing is missing. Make some plots tidier.
df1 = df1[!is.na(df1$standing), ]

# Create id factor, controlling the sort order of the levels.     
id_order = order(standing_data$initial_standing, standing_data$cohort)
df1$id = factor(df1$id, levels=as.character(standing_data$id)[id_order])


p1 = ggplot(df1, aes(x=term, fill=standing)) +
     geom_bar(position="fill", colour="grey20", size=0.5, width=1.0) +
     facet_grid(cohort ~ .) +
     scale_fill_brewer(palette="Set1")

p2 = ggplot(df1, aes(x=term, y=termGPA, group=id)) + 
     geom_line(colour="grey70") + 
     geom_point(aes(colour=standing), size=4) + 
     facet_grid(cohort ~ .) +
     scale_colour_brewer(palette="Set1")

p3 = ggplot(df1, aes(x=term, y=termGPA, group=id)) +
     geom_line(colour="grey70") + 
     geom_point(aes(colour=standing), size=4) + 
     geom_text(aes(label=id), hjust=-0.30, size=3) +
     facet_grid(initial_standing ~ cohort) +
     scale_colour_brewer(palette="Set1")


p4 = ggplot(df1, aes(x=term, y=id, fill=standing)) + 
     geom_tile(colour="grey20") +
     facet_grid(initial_standing ~ ., space="free_y", scales="free_y") +
     scale_fill_brewer(palette="Set1") +
     opts(panel.grid.major=theme_blank()) +
     opts(panel.grid.minor=theme_blank())

ggsave("plot_1.png", p1, width=10, height=6.25, dpi=80)
ggsave("plot_2.png", p2, width=10, height=6.25, dpi=80)
ggsave("plot_3.png", p3, width=10, height=6.25, dpi=80)
ggsave("plot_4.png", p4, width=10, height=6.25, dpi=80)

0 讨论(0)

查看其它3个回答