Good Ways to Visualize Longitudinal Categorical Data in R

后端 未结 3 1736
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-29 23:42

[Update: Although I\'ve accepted an answer, please add another answer if you have additional visualization ideas (whether in R or another language/program). Tex

3条回答
  •  失恋的感觉
    2021-01-30 00:15

    Here are a few ideas for plotting your data. I've used ggplot2, and I've reformatted the data a bit in places.

    Figure 1

    enter image description here I've used a stacked barplot to mimic your mosaic plot and solve the alignment issue.

    Figure 2

    enter image description here Data points for each student are connected by a gray line, making this reminiscent of a parallel coordinates plot. Coloring the points shows the categorical standing. Using GPA on the y-axis helps spread out the points to reduce overplotting, and shows correlation of standing and GPA. A major problem is that many valid standing datapoints drop out because they lack a matching termGPA value.

    Figure 3

    enter image description here Here I've created a new variable called initial_standing to use for facetting. Each panel contains students who match in both cohort and initial_standing. Plotting the id as text makes this figure a bit cluttered, but could be useful in some cases.

    Figure 4

    enter image description here This plot is like a heatmap where each row is a student. I controlled the order of the id axis to force initial_standing and cohort groupings to stay together. If you have many more rows, you may want to consider sorting rows by some type of clustering.

    library(ggplot2)
    
    # Create new data frame for determining initial standing.
    standing_data = data.frame(id=unique(df1$id), initial_standing=NA, cohort=NA)
    
    for (i in 1:nrow(standing_data)) {
        id = standing_data$id[i]
        subdat = df1[df1$id == id, ]
        subdat = subdat[complete.cases(subdat), ]
        initial_standing = subdat$standing[which.min(subdat$term)]
        standing_data[i, "initial_standing"] = as.character(initial_standing)
        standing_data[i, "cohort"] = as.character(subdat$cohort[1])
    }
    
    standing_data$cohort = factor(standing_data$cohort, levels=levels(df1$cohort))
    standing_data$initial_standing = factor(standing_data$initial_standing,
                                            levels=levels(df1$standing))
    
    # Add the new column (initial_standing) to df1.
    df1 = merge(df1, standing_data[, c("id", "initial_standing")], by="id")
    
    # Remove rows where standing is missing. Make some plots tidier.
    df1 = df1[!is.na(df1$standing), ]
    
    # Create id factor, controlling the sort order of the levels.     
    id_order = order(standing_data$initial_standing, standing_data$cohort)
    df1$id = factor(df1$id, levels=as.character(standing_data$id)[id_order])
    
    
    p1 = ggplot(df1, aes(x=term, fill=standing)) +
         geom_bar(position="fill", colour="grey20", size=0.5, width=1.0) +
         facet_grid(cohort ~ .) +
         scale_fill_brewer(palette="Set1")
    
    p2 = ggplot(df1, aes(x=term, y=termGPA, group=id)) + 
         geom_line(colour="grey70") + 
         geom_point(aes(colour=standing), size=4) + 
         facet_grid(cohort ~ .) +
         scale_colour_brewer(palette="Set1")
    
    p3 = ggplot(df1, aes(x=term, y=termGPA, group=id)) +
         geom_line(colour="grey70") + 
         geom_point(aes(colour=standing), size=4) + 
         geom_text(aes(label=id), hjust=-0.30, size=3) +
         facet_grid(initial_standing ~ cohort) +
         scale_colour_brewer(palette="Set1")
    
    
    p4 = ggplot(df1, aes(x=term, y=id, fill=standing)) + 
         geom_tile(colour="grey20") +
         facet_grid(initial_standing ~ ., space="free_y", scales="free_y") +
         scale_fill_brewer(palette="Set1") +
         opts(panel.grid.major=theme_blank()) +
         opts(panel.grid.minor=theme_blank())
    
    ggsave("plot_1.png", p1, width=10, height=6.25, dpi=80)
    ggsave("plot_2.png", p2, width=10, height=6.25, dpi=80)
    ggsave("plot_3.png", p3, width=10, height=6.25, dpi=80)
    ggsave("plot_4.png", p4, width=10, height=6.25, dpi=80)
    

提交回复
热议问题