Comparing multiple columns in different data sets to find values within range R

浪子不回头ぞ 提交于 2020-01-10 06:01:07

问题


I have two datasets. One called domain (d) which as general information about a gene and table called mutation (m). Both tables have similar column called Gene.name, which I'll use to look for. The two datasets do not have the same number of columns or rows.

I want to go through all the data in the file mutation and check to see whether the data found in column gene.name also exists in the file domain. If it does, I want it to check whether the data in column mutation is between the column "Start" and "End" (they can be equal to Start or End). If it is, I want to print it out to a new table with the merged column: Gene.Name, Mutation, and the domain information. If it doesn't exist, ignore it.

So this is what I have so far:

d<-read.table("domains.txt")

d
Gene.name Domain Start  End
ABCF1   low_complexity_region   2   13
DKK1    low_complexity_region   25  39
ABCF1   AAA 328 532
F2  coiled_coil_region  499 558

m<-read.table("mutations.tx")

m
Gene.name   Mutation        
ABCF1   10      
DKK1    21      
ABCF1   335     
xyz 15      
F2  499     

newfile<-m[, list(new=findInterval(d(c(d$Start, d$End)),by'=Gene.Name']

My code isn't working and I'm reading a lot of different questions/answers and I'm much more confused. Any help would be great.

I"d like my final data to look like this:

Gene.name   Mutation    Domain  
DKK1    21  low_complexity_region   
ABCF1   335 AAA 
F2  499 coiled_coil_region  

回答1:


A merge and subset should get you there (though I think your intended result doesn't match your description of what you want):

result <- merge(d,m,by="Gene.name")
result[with(result,Mutation >= Start & Mutation <= End),]

#  Gene.name                Domain Start End Mutation
#1     ABCF1 low_complexity_region     2  13       10
#4     ABCF1                   AAA   328 532      335
#6        F2    coiled_coil_region   499 558      499


来源:https://stackoverflow.com/questions/20670294/comparing-multiple-columns-in-different-data-sets-to-find-values-within-range-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!