How can I tell if a string repeats itself in Python?

后端 未结 13 1334
栀梦
栀梦 2020-11-27 09:07

I\'m looking for a way to test whether or not a given string repeats itself for the entire string or not.

Examples:

[
    \'0045662100456621004566210         


        
相关标签:
13条回答
  • 2020-11-27 09:25

    This version tries only those candidate sequence lengths that are factors of the string length; and uses the * operator to build a full-length string from the candidate sequence:

    def get_shortest_repeat(string):
        length = len(string)
        for i in range(1, length // 2 + 1):
            if length % i:  # skip non-factors early
                continue
    
            candidate = string[:i]
            if string == candidate * (length // i):
                return candidate
    
        return None
    

    Thanks to TigerhawkT3 for noticing that length // 2 without + 1 would fail to match the abab case.

    0 讨论(0)
  • 2020-11-27 09:28

    The problem may also be solved in O(n) in worst case with prefix function.

    Note, it may be slower in general case(UPD: and is much slower) than other solutions which depend on number of divisors of n, but usually find fails sooner, I think one of bad cases for them will be aaa....aab, where there are n - 1 = 2 * 3 * 5 * 7 ... *p_n - 1 a's

    First of all you need to calculate prefix function

    def prefix_function(s):
        n = len(s)
        pi = [0] * n
        for i in xrange(1, n):
            j = pi[i - 1]
            while(j > 0 and s[i] != s[j]):
                j = pi[j - 1]
            if (s[i] == s[j]):
                j += 1
            pi[i] = j;
        return pi
    

    then either there's no answer or the shortest period is

    k = len(s) - prefix_function(s[-1])
    

    and you just have to check if k != n and n % k == 0 (if k != n and n % k == 0 then answer is s[:k], else there's no answer

    You may check the proof here (in Russian, but online translator will probably do the trick)

    def riad(s):
        n = len(s)
        pi = [0] * n
        for i in xrange(1, n):
            j = pi[i - 1]
            while(j > 0 and s[i] != s[j]):
                j = pi[j - 1]
            if (s[i] == s[j]):
                j += 1
            pi[i] = j;
        k = n - pi[-1]
        return s[:k] if (n != k and n % k == 0) else None
    
    0 讨论(0)
  • 2020-11-27 09:30

    First, halve the string as long as it's a "2 part" duplicate. This reduces the search space if there are an even number of repeats. Then, working forwards to find the smallest repeating string, check if splitting the full string by increasingly larger sub-string results in only empty values. Only sub-strings up to length // 2 need to be tested since anything over that would have no repeats.

    def shortest_repeat(orig_value):
        if not orig_value:
            return None
    
        value = orig_value
    
        while True:
            len_half = len(value) // 2
            first_half = value[:len_half]
    
            if first_half != value[len_half:]:
                break
    
            value = first_half
    
        len_value = len(value)
        split = value.split
    
        for i in (i for i in range(1, len_value // 2) if len_value % i == 0):
            if not any(split(value[:i])):
                return value[:i]
    
        return value if value != orig_value else None
    

    This returns the shortest match or None if there is no match.

    0 讨论(0)
  • 2020-11-27 09:31

    In David Zhang's answer if we have some sort of circular buffer this will not work: principal_period('6210045662100456621004566210045662100456621') due to the starting 621, where I would have liked it to spit out: 00456621.

    Extending his solution we can use the following:

    def principal_period(s):
        for j in range(int(len(s)/2)):
            idx = (s[j:]+s[j:]).find(s[j:], 1, -1)
            if idx != -1:
                # Make sure that the first substring is part of pattern
                if s[:j] == s[j:][:idx][-j:]:
                    break
    
        return None if idx == -1 else s[j:][:idx]
    
    principal_period('6210045662100456621004566210045662100456621')
    >>> '00456621'
    
    0 讨论(0)
  • 2020-11-27 09:33

    Here's a concise solution which avoids regular expressions and slow in-Python loops:

    def principal_period(s):
        i = (s+s).find(s, 1, -1)
        return None if i == -1 else s[:i]
    

    See the Community Wiki answer started by @davidism for benchmark results. In summary,

    David Zhang's solution is the clear winner, outperforming all others by at least 5x for the large example set.

    (That answer's words, not mine.)

    This is based on the observation that a string is periodic if and only if it is equal to a nontrivial rotation of itself. Kudos to @AleksiTorhamo for realizing that we can then recover the principal period from the index of the first occurrence of s in (s+s)[1:-1], and for informing me of the optional start and end arguments of Python's string.find.

    0 讨论(0)
  • 2020-11-27 09:37

    Non-regex solution:

    def repeat(string):
        for i in range(1, len(string)//2+1):
            if not len(string)%len(string[0:i]) and string[0:i]*(len(string)//len(string[0:i])) == string:
                return string[0:i]
    

    Faster non-regex solution, thanks to @ThatWeirdo (see comments):

    def repeat(string):
        l = len(string)
        for i in range(1, len(string)//2+1):
            if l%i: continue
            s = string[0:i]
            if s*(l//i) == string:
                return s
    

    The above solution is very rarely slower than the original by a few percent, but it's usually a good bit faster - sometimes a whole lot faster. It's still not faster than davidism's for longer strings, and zero's regex solution is superior for short strings. It comes out to the fastest (according to davidism's test on github - see his answer) with strings of about 1000-1500 characters. Regardless, it's reliably second-fastest (or better) in all cases I tested. Thanks, ThatWeirdo.

    Test:

    print(repeat('009009009'))
    print(repeat('254725472547'))
    print(repeat('abcdeabcdeabcdeabcde'))
    print(repeat('abcdefg'))
    print(repeat('09099099909999'))
    print(repeat('02589675192'))
    

    Results:

    009
    2547
    abcde
    None
    None
    None
    
    0 讨论(0)
提交回复
热议问题