How to reshape data for a stacked barchart using R lattice [duplicate]

只谈情不闲聊 提交于 2020-01-15 11:15:12

问题


I have a bunch of data in a table (imported from csv) in the following format:

date        classes         score
9/1/11       french          34
9/1/11       english         34
9/1/11       french          34
9/1/11       spanish         34
9/2/11       french          34
9/2/11       english         34
9/3/11       spanish         34
9/3/11       spanish         34
9/5/11       spanish         34
9/5/11       english         34
9/5/11       french          34
9/5/11       english         34

Ignore the score column, it's not important.

I need a tally of the total number of students taking English or Spanish or french class based on date, ie. I need to first group it by date and then divide each day into further blocks based on language and plot it as a stacked bar chart so it looks like the following. Each bar represents a date and each cross section of a bar represents a single language.

I've figured out how to do this once I get the data in a matrix form where each row represents a date and every column an attribute (or language). So I assuming the data is in that form in a csv:

ie           french      english       spanish
9/1/11       2           1             1
9/2/11       1           1             0          
9/3/11       0           0             2
9/5/11       1           2             1

then I can do:

directory<-"C:\\test\\language.csv"
ourdata6<-read.csv(directory)

language<-as.matrix(ourdata6)

barchart(prop.table(language), horizontal=FALSE, auto.key = list(space='right',cex=.5,border=T,points=F, lines=F,lwd=5,text=c('french','spanish','enligsh'),cex=.6), main = list(label="Distribution of classes 10",cex=2.5),  ylab = list(", cex=1.7),xlab.top=list("testing",cex=1.2))

The challenge is to get the data from the original format into the format I need.

I tried

a<-count(language, c("date", "classes"))

where it gives me the counts sorted by both but its in a vertical form

ie
9/1/11       french           2             
9/1/11       english          1                       
9/1/11       spanish          1            
etc...

I need to pivot this so it becomes a single row per date. Also if some of these might be zero so I need placeholders for them ie. the first column must correspond to french, the second must correspond to english for my current setup to work.

Any ideas on how to do this or if my approach with matrix + prop.table is even correct? Are there any simpler ways of doing this?


回答1:


Supposing your data is in a dataframe called df, you can do that with the help of the dplyr and tidyr packages:

library(dplyr)
library(tidyr)

wide <- df %>% select(date,classes) %>%
  group_by(date,classes) %>%
  summarise(n=n()) %>%            # as @akrun said, you can also use tally()
  spread(classes, n, fill=0)

Using the example data you provided, this results in the following dataframe:

  date english french spanish
9/1/11       1      2       1
9/2/11       1      1       0
9/3/11       0      0       2
9/5/11       2      1       1

Now you can make a lattice plot with:

barchart(date ~ english + french + spanish, data=wide, stack = TRUE,
         main = list(label="Distribution of language classes",cex=1.6),
         xlab = list("Number of classes", cex=1.1),
         ylab = list("Date", cex=1.1),
         auto.key = list(space='right',cex=1.2,text=c('Enligsh','French','Spanish')))

which gives the following plot:


EDIT: Instead of using lattice-plots, you can also use ggplot2, which is (at least in my opinion) easier to understand. An example:

# convert the wide dataframe to a long one
long <- wide %>% gather(class, n, -date)

# load ggplot2
library(ggplot2)

# create the plot
ggplot(long, aes(date, n, fill=class)) +
  geom_bar(stat="identity", position="stack") +
  coord_flip() +
  theme_bw() +
  theme(axis.title=element_blank(), axis.text=element_text(size=12))

which gives:




回答2:


I hope I'm not missing anything, but it looks to me like you're just looking for table:

table(df[c("date", "classes")])
#         classes
# date     english french spanish
#   9/1/11       1      2       1
#   9/2/11       1      1       0
#   9/3/11       0      0       2
#   9/5/11       2      1       1

The result is a table (which is also a matrix) so you can use your barchart command as you wish.

Here's what I get--looks like you need to work on your legend :-)

The code used was:

language <- table(df[c("date", "classes")])

barchart(prop.table(language), 
         horizontal = FALSE, 
         auto.key = list(space = 'right',
                         cex = .5, border = T, points = F, 
                         lines = F, lwd = 5, 
                         text = c('french','spanish','enligsh'),
                         cex = .6), 
         main = list(label = "Distribution of classes 10", cex = 2.5),
         ylab = list("", cex = 1.7), 
         xlab.top = list("testing", cex = 1.2))


来源:https://stackoverflow.com/questions/25935202/how-to-reshape-data-for-a-stacked-barchart-using-r-lattice

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!