How to find repeating sequence of characters in a given array?

后端 未结 14 776
故里飘歌
故里飘歌 2020-12-02 12:42

My problem is to find the repeating sequence of characters in the given array. simply, to identify the pattern in which the characters are appearing.

          


        
相关标签:
14条回答
  • 2020-12-02 13:07

    Using C++:

    //Splits the string into the fragments of given size
    //Returns the set of of splitted strings avaialble
    set<string> split(string s, int frag)
    {
        set<string> uni;
        int len = s.length();
        for(int i = 0; i < len; i+= frag)
        {
            uni.insert(s.substr(i, frag));
        }
    
        return uni;
    }
    
    int main()
    {
    
        string out;
        string s = "carpentercarpenter";
        int len = s.length();
    
          //Optimistic approach..hope there are only 2 repeated strings
          //If that fails, then try to break the strings with lesser number of
          //characters
        for(int i = len/2; i>1;--i)
        {
            set<string> uni = split(s,i);
            if(uni.size() == 1)
            {
                out = *uni.begin();
                break;
            }
        }
    
        cout<<out;
        return 0;
    
    }
    
    0 讨论(0)
  • 2020-12-02 13:07

    The first idea that comes to my mind is trying all repeating sequences of lengths that divide length(S) = N. There is a maximum of N/2 such lengths, so this results in a O(N^2) algorithm.

    But i'm sure it can be improved...

    0 讨论(0)
  • 2020-12-02 13:11

    Pseudocode

    len = str.length
    for (i in 1..len) {
       if (len%i==0) {
          if (str==str.substr(0,i).repeat(len/i)) {
             return str.substr(0,i)
          }
       }
    }
    

    Note: For brevity, I'm inventing a "repeat" method for strings, which isn't actually part of Java's string; "abc".repeat(2)="abcabc"

    0 讨论(0)
  • 2020-12-02 13:12

    Tongue-in-cheek O(NlogN) solution

    Perform an FFT on your string (treating characters as numeric values). Every peak in the resulting graph corresponds to a substring periodicity.

    0 讨论(0)
  • 2020-12-02 13:13

    For your examples, my first approach would be to

    1. get the first character of the array (for your last example, that would be C)
    2. get the index of the next appearance of that character in the array (e.g. 9)
    3. if it is found, search for the next appearance of the substring between the two appearances of the character (in this case CARPENTER)
    4. if it is found, you're done (and the result is this substring).

    Of course, this works only for a very limited subset of possible arrays, where the same word is repeated over and over again, starting from the beginning, without stray characters in between, and its first character is not repeated within the word. But all your examples fall into this category - and I prefer the simplest solution which could possibly work :-)

    If the repeated word contains the first character multiple times (e.g. CACTUS), the algorithm can be extended to look for subsequent occurrences of that character too, not only the first one (so that it finds the whole repeated word, not only a substring of it).

    Note that this extended algorithm would give a different result for your second example, namely RONRON instead of RON.

    0 讨论(0)
  • 2020-12-02 13:14

    Put all your character in an array e.x. a[]

    i=0; j=0;
    for( 0 < i < count ) 
    {
    if (a[i] == a[i+j+1])
        {++i;}
    else
        {++j;i=0;}
    }
    

    Then the ratio of (i/j) = repeat count in your array. You must pay attention to limits of i and j, but it is the simple solution.

    0 讨论(0)
提交回复
热议问题