perl loops within subroutines to display the longest repeating string thats selected for a particular subsection of the string [duplicate]

微笑、不失礼 提交于 2019-12-11 23:29:22

问题


I was wondering if anyone knows how to simplify, or generalize this code. It gives the correct answer, however it is only applicable to the current situation. My code is as follows:

sub longestRepeat{
                                # list of argument @_ is: (sequence, nucleotide)
  my $someSequence = shift(@_);  # shift off the first  argument from the list
  my $whatBP       = shift(@_);  # shift off the second argument from the list
  my $match = 0;



        if ($whatBP eq "AT"){
            if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) {

            $match = $1
            }
            return $match;

        }
        if ($whatBP eq "TAGA"){
            if ($someSequence =~ m/(([T][A][G][A])\2\2)/g) {

            $match = $1
            }
            return $match;
        }

        if ($whatBP eq "C"){
            if ($someSequence =~ m/(([C])\2\2)/g) {

            $match = $1
            }
            return $match;
        }
}   

My question is, in the second if statement, I have it set to a set amount of that pattern being repeated (applicable for the string we were given). However, is there a way to keep doing a while loop to search through the \2 (pattern repeat)? What I mean is can this: if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) be simplified and generalized with a while loop


回答1:


Based on the name of your subroutine, I'm assuming that you want to find the longest repeat sequence in your sequence.

If so, how about the following:

sub longest_repeat {

    my ( $sequence, $what ) = @_;

    my @matches = $sequence =~ /((?:$what)+)/g ;  # Store all matches

    my $longest;
    foreach my $match ( @matches ) {  # Could also avoid temp variable :
                                      # for my $match ( $sequence =~ /((?:$what)+)/g )

        $longest //= $match ;         # Initialize
                                      #  (could also do `$longest = $match
                                      #                    unless defined $match`)

        $longest = $match if length( $longest ) < length( $match );
    }

    return $longest;  # Note this also handles the case of no matches
}

If you can digest that, the following version achieves essentially the same functionality with a Schwartzian transform:

sub longest_repeat {

    my ( $sequence, $what ) = @_;                          # Example:
                                                           # --------------------
    my ( $longest ) = map { $_->[0] }                      # 'ATAT' ...
                        sort { $b->[1] <=> $a->[1] }       # ['ATAT',4], ['AT',2]
                          map { [ $_, length($_) ] }       # ['AT',2], ['ATAT',4]
                            $sequence =~ /((?:$what)+)/g ; # ... 'AT', 'ATAT'

    return $longest ;
}

Some may argue that it is wasteful to sort because it is O(n.log(n)) instead of O(n) but there's variety for ya.



来源:https://stackoverflow.com/questions/19968434/perl-loops-within-subroutines-to-display-the-longest-repeating-string-thats-sele

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!