问题
I am trying to filter out insertions and deletions from an mpileup txt file. An example of an insertion or deletion would be +3ATG or -9AATCGTCTC.
In another post I found a solution using perl:
regular expression that reference a match from earlier part of expression
However, the script writes insertions and deletions to the special variable $&. I would like to replace all insertions and deletions with nothing in a new variable. So my solution is identical, but with substitution at the start and to be replaced with nothing, see below.
$row =~ s/(\d+)(??{"."*$1})//xg;
Does anyone have any idea why it won't work or an alternative solution?
I would also be happy to match anything that wasn't an insertion or deletion and make this a new variable.
Here is an example of the input:
$,...........................,,.................,,....,,g.,,,,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,...............,,,.....,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.....,,.....,,,,,,,,,,,......,,,,,,,,,,,,,,,,,,,,,,,,,,.,,.,,,.............................,,.,.........,.,.,,....,..........,,......................,,,,,,...........................,,,,,,,,.....,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,,,,,,,,,,,,,,,,,,,,.+12GATGCTGTGTTT..,,,,,,,,.,,,,,,,,,,,,,,,,,,,,,,,.,,.,,-8tgatgctg,,,...,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,..
Here is an example of the output I would like:
$,...........................,,.................,,....,,g.,,,,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,...............,,,.....,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.....,,.....,,,,,,,,,,,......,,,,,,,,,,,,,,,,,,,,,,,,,,.,,.,,,.............................,,.,.........,.,.,,....,..........,,......................,,,,,,...........................,,,,,,,,.....,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,,,,,,,,,,,,,,,,,,,,.+..,,,,,,,,.,,,,,,,,,,,,,,,,,,,,,,,.,,.,,-,,,...,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,..
Cheers,
Daniel
回答1:
Is this what you're after?
use feature qw(say);
my $DNA = ',...........,,....,,g.,,,,,,,,,,,.+12GATGCTGTGTTT..,,,,,.,,.,,-8tgatgctg,,,,,,,,..';
say $DNA;
$DNA =~ s/\d+[ATGCatgc]*//g;
say $DNA;
,...........,,....,,g.,,,,,,,,,,,.+12GATGCTGTGTTT..,,,,,.,,.,,-8tgatgctg,,,,,,,,..
,...........,,....,,g.,,,,,,,,,,,.+..,,,,,.,,.,,-,,,,,,,,..
回答2:
A slight variation on the pattern you already have should work:
$pileup = '$,...........................,,.................,,....,,g.,,,,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,...............,,,.....,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.....,,.....,,,,,,,,,,,......,,,,,,,,,,,,,,,,,,,,,,,,,,.,,.,,,.............................,,.,.........,.,.,,....,..........,,......................,,,,,,...........................,,,,,,,,.....,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,,,,,,,,,,,,,,,,,,,,.+12GATGCTGTGTTT..,,,,,,,,.,,,,,,,,,,,,,,,,,,,,,,,.,,.,,-8tgatgctg,,,...,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,..';
$pileup =~ s/[+-](\d+)(??{"[ACGTN]{$1}"})//gi;
print($pileup, "\n");
PRODUCES
$,...........................,,.................,,....,,g.,,,,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,...............,,,.....,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.....,,.....,,,,,,,,,,,......,,,,,,,,,,,,,,,,,,,,,,,,,,.,,.,,,.............................,,.,.........,.,.,,....,..........,,......................,,,,,,...........................,,,,,,,,.....,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,.,,,,,,,,,,,,,,,,,,,,...,,,,,,,,.,,,,,,,,,,,,,,,,,,,,,,,.,,.,,,,,...,,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,..
Which you'll notice is a couple of characters shorter than your example output as you accidentally left in the signs [+-]
来源:https://stackoverflow.com/questions/37206608/mpileup-regex-command-to-remove-indels