subsetting a dataset in R [duplicate]

混江龙づ霸主 提交于 2019-12-11 19:28:21

问题


I have a question filtering a dataset based on sum of counts

My file looks like this:

g1  a   2
g1  a   3
g1  a   0
g1  b   1
g2  b   3
g2  c   4
g2  d   9
g3  e   1
g3  f   3
g4  g   10
g4  h   18
g4  i   23

First column is gene names. I want to calculate from the third column, the sum associated with each gene, for g1 it's 6 for g2 it's 16 and so on. Then the condition is if the sum of each gene is > 10 then filter the above input dataset such that my output looks like

g2  b   3
g2  c   4
g2  d   9
g4  g   10
g4  h   18
g4  i   23 

this is what I have tried so far:

tab <- read.data("input.txt",header=FALSE)
genelist <- split(tab,tab[,1])

How can i sum it and filter it out > 10. I think I have to use sapply to loop it through but i am stuck here. Any help is appreciated


回答1:


Is this what you're looking for?

n_vars <- 40
gene <- sample(x=c("g1","g2","g3","g4"),size=n_vars,replace = TRUE)
v1 <- sample(x=c("a","b","c","d","e","f","g"),size=n_vars,replace = TRUE)
result <- rnorm(n=n_vars,mean=0,sd=10)

df <- data.frame(gene,v1,result) %>% 
  arrange(gene,v1) %>% 
  group_by(gene,v1) %>% 
  summarise(total=sum(result)) %>% 
  filter(total>10)


来源:https://stackoverflow.com/questions/56052560/subsetting-a-dataset-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!