问题
I am facing a problem with a dataset which has overlapping factor levels.
I would like to produce timelines, barplots and statistics by factor level - however, I want the factor levels to be equivocal. That means that observations belonging to more than one level should appear several times in a plot.
Here is an example of how my data structure looks like:
head <- c("ID","YEAR","BRAZIL","GERMANY","US","FRANCE")
data <- data.frame(matrix(c(1,2000,1,0,0,0,
2,2010,0,1,1,0,
3,2011,0,1,0,0,
4,2012,1,0,0,1,
5,2012,0,1,0,0,
6,2013,0,0,0,1),
nrow=6, ncol=6, byrow=T))
names(data) <- head
Obiously, a possible factor variable "COUNTRY"
cannot be created the usual way. It would force factor levels to be clear-cut (in our case there would be 4 levels: Brazil, Germany, US and France):
data$COUNTRY[data$BRAZIL==1 &
data$GERMANY==0 &
data$US==0 &
data$FRANCE==0] <- "Brazil"
data$COUNTRY[data$BRAZIL==0 &
data$GERMANY==1 &
data$US==0 &
data$FRANCE==0] <- "Germany"
etc...
factor(data$COUNTRY)
But this is not what, I want...
My problem is that plotting by factor only works if factor levels are properly unambiguous. I would like to produce something like this:
require(ggplot2)
MYPLOT <- qplot(data$YEAR, data$COUNTRY)
MYPLOT + geom_point(aes(size=..count..), stat="bin") + scale_size(range=c(0, 15))
with observations belonging to i factor levels to appear i times in the plot.
- How should I transform my data.frame in order to get what I desire?
- Should I simply duplicate those observations belonging to i factor levels i times? If yes, how should I do that?
- Is a workaround which does not require case duplications?
Ideas anyone?
回答1:
I think you have to duplicate those rows to represent each observation. and remove any with 0.
library(reshape2)
d2<-melt(data, id.var=c("ID","YEAR"))
d3<-d2[d2$value!=0,]
library(ggplot2)
qplot(d3$YEAR, d3$variable)
来源:https://stackoverflow.com/questions/21391304/how-to-deal-with-overlapping-factor-levels-e-g-when-producing-tables-and-plot