问题
I have a question filtering a dataset based on sum of counts
My file looks like this:
g1 a 2
g1 a 3
g1 a 0
g1 b 1
g2 b 3
g2 c 4
g2 d 9
g3 e 1
g3 f 3
g4 g 10
g4 h 18
g4 i 23
First column is gene names. I want to calculate from the third column, the sum associated with each gene, for g1 it's 6 for g2 it's 16 and so on. Then the condition is if the sum of each gene is > 10 then filter the above input dataset such that my output looks like
g2 b 3
g2 c 4
g2 d 9
g4 g 10
g4 h 18
g4 i 23
this is what I have tried so far:
tab <- read.data("input.txt",header=FALSE)
genelist <- split(tab,tab[,1])
How can i sum it and filter it out > 10. I think I have to use sapply to loop it through but i am stuck here. Any help is appreciated
回答1:
Is this what you're looking for?
n_vars <- 40
gene <- sample(x=c("g1","g2","g3","g4"),size=n_vars,replace = TRUE)
v1 <- sample(x=c("a","b","c","d","e","f","g"),size=n_vars,replace = TRUE)
result <- rnorm(n=n_vars,mean=0,sd=10)
df <- data.frame(gene,v1,result) %>%
arrange(gene,v1) %>%
group_by(gene,v1) %>%
summarise(total=sum(result)) %>%
filter(total>10)
来源:https://stackoverflow.com/questions/56052560/subsetting-a-dataset-in-r