问题
I have a bunch of data in a table (imported from csv) in the following format:
date classes score
9/1/11 french 34
9/1/11 english 34
9/1/11 french 34
9/1/11 spanish 34
9/2/11 french 34
9/2/11 english 34
9/3/11 spanish 34
9/3/11 spanish 34
9/5/11 spanish 34
9/5/11 english 34
9/5/11 french 34
9/5/11 english 34
Ignore the score column, it's not important.
I need a tally of the total number of students taking English or Spanish or french class based on date, ie. I need to first group it by date and then divide each day into further blocks based on language and plot it as a stacked bar chart so it looks like the following. Each bar represents a date and each cross section of a bar represents a single language.
I've figured out how to do this once I get the data in a matrix form where each row represents a date and every column an attribute (or language). So I assuming the data is in that form in a csv:
ie french english spanish
9/1/11 2 1 1
9/2/11 1 1 0
9/3/11 0 0 2
9/5/11 1 2 1
then I can do:
directory<-"C:\\test\\language.csv"
ourdata6<-read.csv(directory)
language<-as.matrix(ourdata6)
barchart(prop.table(language), horizontal=FALSE, auto.key = list(space='right',cex=.5,border=T,points=F, lines=F,lwd=5,text=c('french','spanish','enligsh'),cex=.6), main = list(label="Distribution of classes 10",cex=2.5), ylab = list(", cex=1.7),xlab.top=list("testing",cex=1.2))
The challenge is to get the data from the original format into the format I need.
I tried
a<-count(language, c("date", "classes"))
where it gives me the counts sorted by both but its in a vertical form
ie
9/1/11 french 2
9/1/11 english 1
9/1/11 spanish 1
etc...
I need to pivot this so it becomes a single row per date. Also if some of these might be zero so I need placeholders for them ie. the first column must correspond to french, the second must correspond to english for my current setup to work.
Any ideas on how to do this or if my approach with matrix + prop.table is even correct? Are there any simpler ways of doing this?
回答1:
Supposing your data is in a dataframe called df
, you can do that with the help of the dplyr
and tidyr
packages:
library(dplyr)
library(tidyr)
wide <- df %>% select(date,classes) %>%
group_by(date,classes) %>%
summarise(n=n()) %>% # as @akrun said, you can also use tally()
spread(classes, n, fill=0)
Using the example data you provided, this results in the following dataframe:
date english french spanish
9/1/11 1 2 1
9/2/11 1 1 0
9/3/11 0 0 2
9/5/11 2 1 1
Now you can make a lattice
plot with:
barchart(date ~ english + french + spanish, data=wide, stack = TRUE,
main = list(label="Distribution of language classes",cex=1.6),
xlab = list("Number of classes", cex=1.1),
ylab = list("Date", cex=1.1),
auto.key = list(space='right',cex=1.2,text=c('Enligsh','French','Spanish')))
which gives the following plot:
EDIT: Instead of using lattice-plots, you can also use ggplot2
, which is (at least in my opinion) easier to understand. An example:
# convert the wide dataframe to a long one
long <- wide %>% gather(class, n, -date)
# load ggplot2
library(ggplot2)
# create the plot
ggplot(long, aes(date, n, fill=class)) +
geom_bar(stat="identity", position="stack") +
coord_flip() +
theme_bw() +
theme(axis.title=element_blank(), axis.text=element_text(size=12))
which gives:
回答2:
I hope I'm not missing anything, but it looks to me like you're just looking for table
:
table(df[c("date", "classes")])
# classes
# date english french spanish
# 9/1/11 1 2 1
# 9/2/11 1 1 0
# 9/3/11 0 0 2
# 9/5/11 2 1 1
The result is a table
(which is also a matrix
) so you can use your barchart
command as you wish.
Here's what I get--looks like you need to work on your legend :-)
The code used was:
language <- table(df[c("date", "classes")])
barchart(prop.table(language),
horizontal = FALSE,
auto.key = list(space = 'right',
cex = .5, border = T, points = F,
lines = F, lwd = 5,
text = c('french','spanish','enligsh'),
cex = .6),
main = list(label = "Distribution of classes 10", cex = 2.5),
ylab = list("", cex = 1.7),
xlab.top = list("testing", cex = 1.2))
来源:https://stackoverflow.com/questions/25935202/how-to-reshape-data-for-a-stacked-barchart-using-r-lattice