Filter overlapping entries in bed file

痴心易碎 提交于 2019-12-22 09:50:47

问题


I have a bed file that looks like this:

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -
1   187576  187587  chr1:187375-187577  0   -
1   187580  187590  chr1:187379-187577  0   -

My aim is to extract only those rows for which entries do not overlap with any others. For some time I have been trying bedtools merge according to the doc. I wanted to use specific flags to count the entries that constituted to each "merged" fragment and later keep only those with value "1" but here comes the problem: I don't know how to keep the information about the strand, score (this should always be 0) and name(this might be reconstructed from first 3 columns). Does anyone know how to put these things together?

Output should look exactly as input (above) bed but only with these rows that do not overlap with anything else.

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -

回答1:


OK, I worked this out:

1) Count the overlaps in the original input

bedtools merge -i IN.bed -c 1 -o count > counted

2) Filter out only those rows that do not overlap with anything

awk '/\t1$/{print}' counted > filtered

3) Intersect it with the original input and keep only those original rows that were found after filtering as well

bedtools intersect -a IN.bed -b filtered -wa > OUT.bed


来源:https://stackoverflow.com/questions/43432149/filter-overlapping-entries-in-bed-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!