dcast with custom fun.aggregate

牧云@^-^@ 提交于 2019-12-10 10:45:43

问题


I have data that looks like this:

sample start end gene coverage
X      1     10  A    5
X      11    20  A    10
Y      1     10  A    5
Y      11    20  A    10
X      1     10  B    5
X      11    20  B    10
Y      1     10  B    5
Y      11    20  B    10

I added additional columns:

data$length <- (data$end - data$start + 1)

data$ct_lt <- (data$length * data$coverage)

I reformated my data using dcast:

casted <- dcast(data, gene ~ sample, value.var = "coverage", fun.aggregate = mean)

So my new data looks like this:

gene    X       Y
A      10.00000 10.00000
B      38.33333 38.33333

This is the correct data format I desire, but I would like to fun.aggregate differently. Instead, I would like to take a weighted average, with coverage weighted by length:

( sum (ct_lt) ) / ( sum ( length ) )

How do I go about doing this?


回答1:


Disclosure: no R in front of me, but I think your friend here may be the dplyr and tidyr packages.

Certainly lots of ways to accomplish this, but I think the following might get you started

library(dplyr)
library(tidyr)

data %>%
select(gene, sample, ct_lt, length) %>%
group_by(gene, sample) %>%
summarise(weight_avg = sum(ct_lt) / sum(length)) %>%
spread(sample, weight_avg)

Hope this helps...



来源:https://stackoverflow.com/questions/28080043/dcast-with-custom-fun-aggregate

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!