How to deal with overlapping factor levels? (e.g. when producing tables and plots)

痴心易碎 提交于 2019-12-08 01:59:40

问题


I am facing a problem with a dataset which has overlapping factor levels.

I would like to produce timelines, barplots and statistics by factor level - however, I want the factor levels to be equivocal. That means that observations belonging to more than one level should appear several times in a plot.

Here is an example of how my data structure looks like:

head <- c("ID","YEAR","BRAZIL","GERMANY","US","FRANCE")
data <- data.frame(matrix(c(1,2000,1,0,0,0,
                            2,2010,0,1,1,0,
                            3,2011,0,1,0,0,
                            4,2012,1,0,0,1,
                            5,2012,0,1,0,0,
                            6,2013,0,0,0,1), 
                         nrow=6, ncol=6, byrow=T))
names(data) <- head

Obiously, a possible factor variable "COUNTRY" cannot be created the usual way. It would force factor levels to be clear-cut (in our case there would be 4 levels: Brazil, Germany, US and France):

data$COUNTRY[data$BRAZIL==1 & 
             data$GERMANY==0 & 
             data$US==0 & 
             data$FRANCE==0]  <- "Brazil"
data$COUNTRY[data$BRAZIL==0 & 
             data$GERMANY==1 & 
             data$US==0 & 
             data$FRANCE==0]  <- "Germany"

etc...

factor(data$COUNTRY)

But this is not what, I want...


My problem is that plotting by factor only works if factor levels are properly unambiguous. I would like to produce something like this:

require(ggplot2)
MYPLOT <- qplot(data$YEAR, data$COUNTRY)
MYPLOT + geom_point(aes(size=..count..), stat="bin") + scale_size(range=c(0, 15)) 

with observations belonging to i factor levels to appear i times in the plot.

  • How should I transform my data.frame in order to get what I desire?
  • Should I simply duplicate those observations belonging to i factor levels i times? If yes, how should I do that?
  • Is a workaround which does not require case duplications?

Ideas anyone?


回答1:


I think you have to duplicate those rows to represent each observation. and remove any with 0.

library(reshape2)
d2<-melt(data, id.var=c("ID","YEAR"))
d3<-d2[d2$value!=0,]
library(ggplot2)
qplot(d3$YEAR, d3$variable)


来源:https://stackoverflow.com/questions/21391304/how-to-deal-with-overlapping-factor-levels-e-g-when-producing-tables-and-plot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!