问题
I was wondering if anyone knows how to simplify, or generalize this code. It gives the correct answer, however it is only applicable to the current situation. My code is as follows:
sub longestRepeat{
# list of argument @_ is: (sequence, nucleotide)
my $someSequence = shift(@_); # shift off the first argument from the list
my $whatBP = shift(@_); # shift off the second argument from the list
my $match = 0;
if ($whatBP eq "AT"){
if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) {
$match = $1
}
return $match;
}
if ($whatBP eq "TAGA"){
if ($someSequence =~ m/(([T][A][G][A])\2\2)/g) {
$match = $1
}
return $match;
}
if ($whatBP eq "C"){
if ($someSequence =~ m/(([C])\2\2)/g) {
$match = $1
}
return $match;
}
}
My question is, in the second if statement, I have it set to a set amount of that pattern being repeated (applicable for the string we were given). However, is there a way to keep doing a while loop to search through the \2 (pattern repeat)? What I mean is can this: if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) be simplified and generalized with a while loop
回答1:
Based on the name of your subroutine, I'm assuming that you want to find the longest repeat sequence in your sequence.
If so, how about the following:
sub longest_repeat {
my ( $sequence, $what ) = @_;
my @matches = $sequence =~ /((?:$what)+)/g ; # Store all matches
my $longest;
foreach my $match ( @matches ) { # Could also avoid temp variable :
# for my $match ( $sequence =~ /((?:$what)+)/g )
$longest //= $match ; # Initialize
# (could also do `$longest = $match
# unless defined $match`)
$longest = $match if length( $longest ) < length( $match );
}
return $longest; # Note this also handles the case of no matches
}
If you can digest that, the following version achieves essentially the same functionality with a Schwartzian transform:
sub longest_repeat {
my ( $sequence, $what ) = @_; # Example:
# --------------------
my ( $longest ) = map { $_->[0] } # 'ATAT' ...
sort { $b->[1] <=> $a->[1] } # ['ATAT',4], ['AT',2]
map { [ $_, length($_) ] } # ['AT',2], ['ATAT',4]
$sequence =~ /((?:$what)+)/g ; # ... 'AT', 'ATAT'
return $longest ;
}
Some may argue that it is wasteful to sort
because it is O(n.log(n))
instead of O(n)
but there's variety for ya.
来源:https://stackoverflow.com/questions/19968434/perl-loops-within-subroutines-to-display-the-longest-repeating-string-thats-sele