问题
I have a bed file that looks like this:
1 183113 183114 chr1:183113-183240 0 +
1 187286 187287 chr1:187128-187287 0 -
1 187576 187587 chr1:187375-187577 0 -
1 187580 187590 chr1:187379-187577 0 -
My aim is to extract only those rows for which entries do not overlap with any others. For some time I have been trying bedtools merge according to the doc. I wanted to use specific flags to count the entries that constituted to each "merged" fragment and later keep only those with value "1" but here comes the problem: I don't know how to keep the information about the strand, score (this should always be 0) and name(this might be reconstructed from first 3 columns). Does anyone know how to put these things together?
Output should look exactly as input (above) bed but only with these rows that do not overlap with anything else.
1 183113 183114 chr1:183113-183240 0 +
1 187286 187287 chr1:187128-187287 0 -
回答1:
OK, I worked this out:
1) Count the overlaps in the original input
bedtools merge -i IN.bed -c 1 -o count > counted
2) Filter out only those rows that do not overlap with anything
awk '/\t1$/{print}' counted > filtered
3) Intersect it with the original input and keep only those original rows that were found after filtering as well
bedtools intersect -a IN.bed -b filtered -wa > OUT.bed
来源:https://stackoverflow.com/questions/43432149/filter-overlapping-entries-in-bed-file