How to find repeating sequence of characters in a given array?

后端未结

关注

 14  777

My problem is to find the repeating sequence of characters in the given array. simply, to identify the pattern in which the characters are appearing.


                      
              相关标签:


      
      
        
          14条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  臣服心动        
                
              
                            
                2020-12-02 13:16
              
            
            
                                                                       
Not sure how you define "efficiently". For easy/fast implementation you could do this in Java:

    private static String findSequence(String text) {
        Pattern pattern = Pattern.compile("(.+?)\\1+");
        Matcher matcher = pattern.matcher(text);
        return matcher.matches() ? matcher.group(1) : null;
    }


it tries to find the shortest string (.+?) that must be repeated at least once (\1+) to match the entire input text.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  猫巷女王i        
                
              
                            
                2020-12-02 13:17
              
            
            
                                                                       
Here is a more general solution to the problem, that will find repeating subsequences within an sequence (of anything), where the subsequences do not have to start at the beginning, nor immediately follow each other.

given an sequence b[0..n], containing the data in question, and a threshold t being the minimum subsequence length to find,

l_max = 0, i_max = 0, j_max = 0;
for (i=0; i<n-(t*2);i++) {
  for (j=i+t;j<n-t; j++) {
    l=0;
    while (i+l<j && j+l<n && b[i+l] == b[j+l])
      l++;
    if (l>t) {
      print "Sequence of length " + l + " found at " + i + " and " + j);
      if (l>l_max) {
        l_max = l;
        i_max = i;
        j_max = j;
      }
    }
  }
}
if (l_max>t) {
  print "longest common subsequence found at " + i_max + " and " + j_max + " (" + l_max + " long)";
}


Basically: 


Start at the beginning of the data, iterate until within 2*t of the end (no possible way to have two distinct subsequences of length t in less than 2*t of space!)
For the second subsequence, start at least t bytes beyond where the first sequence begins.
Then, reset the length of the discovered subsequence to 0, and check to see if you have a common character at i+l and j+l. As long as you do, increment l.
When you no longer have a common character, you have reached the end of your common subsequence. 
If the subsequence is longer than your threshold, print the result.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
3
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复