Ordering stacks by size in a ggplot2 stacked bar graph

☆樱花仙子☆ 提交于 2019-12-17 19:19:38

问题


So i have a load of data which I have sampled as an example below:

Sequence  Abundance   Length
CAGTG    3       25
CGCTG    82      23
GGGAC    4       25
CTATC    16      23
CTTGA    14      25
CAAGG    9       24
GTAAT    5       24
ACGAA    32      22
TCGGA    10      22
TAGGC    30      21
TGCCG    25      21
TCCGG    2       21
CGCCT    22      24
TTGGC    4       22
ATTCC    4       23

I'm only showing the first 4 words of each sequence here, but in reality they are "Length" long. I am looking at the abundances of sequences for each size class that I have here. In addition, I want to visualise the proportion of abundance that a particular sequence represents within its size class. Currently, I can make a stacked bar graph like this:

ggplot(tab, aes(x=Length, y=Abundance, fill=Sequence)) 
  + geom_bar(stat='identity') 
  + opts(legend.position="none")

This is fine for a small data set like this, but I have about 1.7 million rows in my actual data set. It looks very colourful and I can see that particular sequences hold a majority abundance in one size class but it is very messy.

I would like to be able to order the coloured stacked bars for each size by that sequence's abundance. i.e. the bars with the highest abundance within their stack are at the bottom of each stack and the bars with the lowest abundance are at the top. It should look a lot more presentable that way.

Any ideas on how to do this in ggplot2? I know there's an "order" parameter in the aes() but I can't work out what it should do with data in the format that I have.


回答1:


The order that bars are drawn (bottom to top) in a stacked barplot in ggplot2 is based on the ordering of the factor which defines the groups. So the Sequence factor must be reordered based on the Abundance. But to get the right stacking order, the order must be reversed.

ab.tab$Sequence <- reorder(ab.tab$Sequence, ab.tab$Abundance)
ab.tab$Sequence <- factor(ab.tab$Sequence, levels=rev(levels(ab.tab$Sequence)))

Using your code now gives the plot you requested

ggplot(ab.tab, aes(x=Length, y=Abundance, fill=Sequence)) +
  geom_bar(stat='identity') +
  opts(legend.position="none")

I might recommend, however, something slightly different. Since you are suppressing the scale which maps color to sequence, and your description seems to indicate that you don't care about the specific sequence anyway (and there will be many), why not leave that part out? Just draw the outlines of the bars without any filling color.

ggplot(ab.tab, aes(x=Length, y=Abundance, group=Sequence)) +
  geom_bar(stat='identity', colour="black", fill=NA)



来源:https://stackoverflow.com/questions/9227389/ordering-stacks-by-size-in-a-ggplot2-stacked-bar-graph

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!