Mpileup regex command to remove indels

半城伤御伤魂 提交于 2019-12-13 07:17:46

问题


I am trying to filter out insertions and deletions from an mpileup txt file. An example of an insertion or deletion would be +3ATG or -9AATCGTCTC.

In another post I found a solution using perl:

regular expression that reference a match from earlier part of expression

However, the script writes insertions and deletions to the special variable $&. I would like to replace all insertions and deletions with nothing in a new variable. So my solution is identical, but with substitution at the start and to be replaced with nothing, see below.

$row =~ s/(\d+)(??{"."*$1})//xg;

Does anyone have any idea why it won't work or an alternative solution?

I would also be happy to match anything that wasn't an insertion or deletion and make this a new variable.


Here is an example of the input:

$,...........................,,.................,,....,,g.,,,,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,...............,,,.....,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.....,,.....,,,,,,,,,,,......,,,,,,,,,,,,,,,,,,,,,,,,,,.,,.,,,.............................,,.,.........,.,.,,....,..........,,......................,,,,,,...........................,,,,,,,,.....,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,,,,,,,,,,,,,,,,,,,,.+12GATGCTGTGTTT..,,,,,,,,.,,,,,,,,,,,,,,,,,,,,,,,.,,.,,-8tgatgctg,,,...,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,..

Here is an example of the output I would like:

$,...........................,,.................,,....,,g.,,,,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,...............,,,.....,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.....,,.....,,,,,,,,,,,......,,,,,,,,,,,,,,,,,,,,,,,,,,.,,.,,,.............................,,.,.........,.,.,,....,..........,,......................,,,,,,...........................,,,,,,,,.....,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,,,,,,,,,,,,,,,,,,,,.+..,,,,,,,,.,,,,,,,,,,,,,,,,,,,,,,,.,,.,,-,,,...,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,..

Cheers,

Daniel


回答1:


Is this what you're after?

use feature qw(say);

my $DNA = ',...........,,....,,g.,,,,,,,,,,,.+12GATGCTGTGTTT..,,,,,.,,.,,-8tgatgctg,,,,,,,,..';

say $DNA;

$DNA =~ s/\d+[ATGCatgc]*//g;

say $DNA;

,...........,,....,,g.,,,,,,,,,,,.+12GATGCTGTGTTT..,,,,,.,,.,,-8tgatgctg,,,,,,,,..
,...........,,....,,g.,,,,,,,,,,,.+..,,,,,.,,.,,-,,,,,,,,..



回答2:


A slight variation on the pattern you already have should work:

$pileup = '$,...........................,,.................,,....,,g.,,,,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,...............,,,.....,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.....,,.....,,,,,,,,,,,......,,,,,,,,,,,,,,,,,,,,,,,,,,.,,.,,,.............................,,.,.........,.,.,,....,..........,,......................,,,,,,...........................,,,,,,,,.....,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,,,,,,,,,,,,,,,,,,,,.+12GATGCTGTGTTT..,,,,,,,,.,,,,,,,,,,,,,,,,,,,,,,,.,,.,,-8tgatgctg,,,...,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,..';

$pileup =~ s/[+-](\d+)(??{"[ACGTN]{$1}"})//gi;

print($pileup, "\n");

PRODUCES

$,...........................,,.................,,....,,g.,,,,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,...............,,,.....,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.....,,.....,,,,,,,,,,,......,,,,,,,,,,,,,,,,,,,,,,,,,,.,,.,,,.............................,,.,.........,.,.,,....,..........,,......................,,,,,,...........................,,,,,,,,.....,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,,,,,,,,,,,,,,,,,,,,...,,,,,,,,.,,,,,,,,,,,,,,,,,,,,,,,.,,.,,,,,...,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,..

Which you'll notice is a couple of characters shorter than your example output as you accidentally left in the signs [+-]



来源:https://stackoverflow.com/questions/37206608/mpileup-regex-command-to-remove-indels

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!