How to find the period of a string

前端 未结 5 1071
花落未央
花落未央 2021-02-03 12:58

I take a input from the user and its a string with a certain substring which repeats itself all through the string. I need to output the substring or its length AKA period.

相关标签:
5条回答
  • 2021-02-03 13:23

    Let me assume that the length of the string n is at least twice greater than the period p.

    Algorithm

    1. Let m = 1, and S the whole string
    2. Take m = m*2
      • Find the next occurrence of the substring S[:m]
      • Let k be the start of the next occurrence
      • Check if S[:k] is the period
      • if not go to 2.

    Example

    Suppose we have a string

    CDCDFBFCDCDFDFCDCDFBFCDCDFDFCDC
    

    For each power m of 2 we find repetitions of first 2^m characters. Then we extend this sequence to it's second occurrence. Let's start with 2^1 so CD.

    CDCDFBFCDCDFDFCDCDFBFCDCDFDFCDC
    CDCD   CDCD   CDCD   CDCD   CD
    

    We don't extend CD since the next occurrence is just after that. However CD is not the substring we are looking for so let's take the next power: 2^2 = 4 and substring CDCD.

    CDCDFBFCDCDFDFCDCDFBFCDCDFDFCDC
    CDCD   CDCD
    

    Now let's extend our string to the first repetition. We get

    CDCDFBF
    

    we check if this is periodic. It is not so we go further. We try 2^3 = 8, so CDCDFBFC

    CDCDFBFCDCDFDFCDCDFBFCDCDFDFCDC
    CDCDFBFC      CDCDFBFC      
    

    we try to extend and we get

    CDCDFBFCDCDFDF
    

    and this indeed is our period.

    I expect this to work in O(n log n) with some KMP-like algorithm for checking where a given string appears. Note that some edge cases still should be worked out here.

    Intuitively this should work, but my intuition failed once on this problem already so please correct me if I'm wrong. I will try to figure out a proof.

    A very nice problem though.

    0 讨论(0)
  • 2021-02-03 13:23

    I too have been looking for the time-space-optimal solution to this problem. The accepted answer by tmyklebu essentially seems to be it, but I would like to offer some explanation of what it's actually about and some further findings.

    First, this question by me proposes a seemingly promising but incorrect solution, with notes on why it's incorrect: Is this algorithm correct for finding period of a string?

    In general, the problem "find the period" is equivalent to "find the pattern within itself" (in some sense, "strstr(x+1,x)"), but with no constraints matching past its end. This means that you can find the period by taking any left-to-right string matching algorith, and applying it to itself, considering a partial match that hits the end of the haystack/text as a match, and the time and space requirements are the same as those of whatever string matching algorithm you use.

    The approach cited in tmyklebu's answer is essentially applying this principle to String Matching on Ordered Alphabets, also explained here. Another time-space-optimal solution should be possible using the GS algorithm.

    The fairly well-known and simple Two Way algorithm (also explained here) unfortunately is not a solution because it's not left-to-right. In particular, the advancement after a mismatch in the left factor depends on the right factor having been a match, and the impossibility of another match misaligned with the right factor modulo the right factor's period. When searching for the pattern within itself and disregarding anything past the end, we can't conclude anything about how soon the next right-factor match could occur (part or all of the right factor may have shifted past the end of the pattern), and therefore a shift that preserves linear time cannot be made.

    Of course, if working space is available, a number of other algorithms may be used. KMP is linear-time with O(n) space, and it may be possible to adapt it to something still reasonably efficient with only logarithmic space.

    0 讨论(0)
  • 2021-02-03 13:30

    You can build a suffix tree for the entire string in linear time (suffix tree is easy to look up online), and then recursively compute and store the number of suffix tree leaves (occurences of the suffix prefix) N(v) below each internal node v of the suffix tree. Also recursively compute and store the length of each suffix prefix L(v) at each node of the tree. Then, at an internal node v in the tree, the suffix prefix encoded at v is a repeating subsequence that generates your string if N(v) equals the total length of the string divided by L(v).

    0 讨论(0)
  • 2021-02-03 13:43

    You can do this in linear time and constant additional space by inductively computing the period of each prefix of the string. I can't recall the details (there are several things to get right), but you can find them in Section 13.6 of "Text algorithms" by Crochemore and Rytter under function Per(x).

    0 讨论(0)
  • 2021-02-03 13:48

    Well if every character in the input string is part of the repeating substring, then all you have to do is store first character and compare it with rest of the string's characters one by one. If you find a match, string until to matched one is your repeating string.

    0 讨论(0)
提交回复
热议问题