How to find the max. number of times a sequence of characters repeats consecutively in Python?

问题

I'm working on a cs50/pset6/dna project. I'm struggling with finding a way to analize a sequence of strings, and gather the maximum number of times a certain sequence of characters repeats consecutively. Here is an example:

String: JOKHCNHBVDBVDBVDJHGSBVDBVD

Sequence of characters I should look for: BVD

Result: My function should be able to return 3, because in one point the characters BVD repeat three times consecutively, and even though it repeats again two times, I should look for the time that it repeats the most number of times. If you still need a better explanation, please look at the first 1:16 of this video: https://www.youtube.com/watch?time_continue=221&v=j84b_EgntcQ&feature=emb_title

I would LOVE if you could help, thanks!

回答1:

It's a bit lame, but one "brute-force"ish way would be to just check for the presence of the longest substring possible. As soon as a substring is found, break out of the loop:

EDIT - Using a function might be more straight forward:

def get_longest_repeating_pattern(string, pattern):
    if not pattern:
        return ""
    for i in range(len(string)//len(pattern), 0, -1):
        current_pattern = pattern * i
        if current_pattern in string:
            return current_pattern
    return ""

string = "JOKHCNHBVDBVDBVDJHGSBVDBVD"
pattern = "BVD"


longest_repeating_pattern = get_longest_repeating_pattern(string, pattern)
print(len(longest_repeating_pattern))

EDIT - explanation:

First, just a simple for-loop that starts at a larger number and goes down to a smaller number. For example, we start at 5 and go down to 0 (but not including 0), with a step size of -1:

>>> for i in range(5, 0, -1):
    print(i)

    
5
4
3
2
1
>>>

if string = "JOKHCNHBVDBVDBVDJHGSBVDBVD", then len(string) would be 26, if pattern = "BVD", then len(pattern) is 3.

Back to my original code:

for i in range(len(string)//len(pattern), 0, -1):

Plugging in the numbers:

for i in range(26//3, 0, -1):

26//3 is an integer division which yields 8, so this becomes:

for i in range(8, 0, -1):

So, it's a for-loop that goes from 8 to 1 (remember, it doesn't go down to 0). i takes on the new value for each iteration, first 8 , then 7, etc.

In Python, you can "multiply" strings, like so:

>>> pattern = "BVD"
>>> pattern * 1
'BVD'
>>> pattern * 2
'BVDBVD'
>>> pattern * 3
'BVDBVDBVD'
>>>

回答2:

A slightly less bruteforcey solution:

string = 'JOKHCNHBVDBVDBVDJHGSBVDBVD'
key = 'BVD'

len_k = len(key)
max_l = 0
passes = 0
curr_len=0

for i in range(len(string) - len_k + 1): # split the string into substrings of same len as key
    if passes > 0: # If key was found in previous sequences, pass ()this way, if key is 'BVD', we will ignore 'VD.' and 'D..'
        passes-=1
        continue
    s = string[i:i+len_k]
    if s == key:
        curr_len+=1
        if curr_len > max_l:
            max_l=curr_len
        passes = len(key)-1
        if prev_s == key:
            if curr_len > max_l:
                max_l=curr_len
    else:
        curr_len=0
    prev_s = s
    
print(max_l)

回答3:

You can do that very easily, elegantly and efficiently using a regex.

We look for all sequences of at least one repetition of your search string. Then, we just need to take the maximum length of these sequences, and divide by the length of the search string.

The regex we use is '(:?<your_sequence>)+': at least one repetition (the +) of the group (<your_sequence>). The :? is just here to make the group non capturing, so that findall returns the whole match, and not just the group.

In case there is no match, we use the default parameter of the max function to return 0.

The code is very short, then:

import re

def max_consecutive_repetitions(search, data):
    search_re = re.compile('(?:' + search + ')+')
    return max((len(seq) for seq in search_re.findall(data)), default=0) // len(search)

Sample run:

print(max_consecutive_repetitions("BVD", "JOKHCNHBVDBVDBVDJHGSBVDBVD"))
# 3

来源：https://stackoverflow.com/questions/62916456/how-to-find-the-max-number-of-times-a-sequence-of-characters-repeats-consecutiv

标签

python

python-3.x

string

Sequence

cs50