Detect whether sequence is a multiple of a subsequence in Python

前端未结

关注

 7  1613

I have a tuple of zeros and ones, for instance:

(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)

It turns out:

(1, 0, 1, 1, 1, 0, 1, 1, 1,


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2021-02-07 14:47
              
            
            
                                                                       
I believe I have an O(n) time solution (actually 2n+r, n is length of tuple, r is sub tuplle) which does not use suffix trees, but uses a string matching algorithm (like KMP, which you should find off-the shelf).

We use the following little known theorem: 

If x,y are strings over some alphabet,

then xy = yx if and only if x = z^k and y = z^l for some string z and integers k,l.


I now claim that, for the purposes of our problem, this means that all we need to do is determine if the given tuple/list (or string) is a cyclic shift of itself!

To determine if a string is a cyclic shift of itself, we concatenate it with itself (it does not even have to be a real concat, just a virtual one will do) and check for a substring match (with itself).

For a proof of that, suppose the string is a cyclic shift of itself.

The we have that the given string y = uv = vu.
Since uv = vu, we must have that u = z^k and v= z^l and hence y = z^{k+l} from the above theorem. The other direction is easy to prove.

Here is the python code. The method is called powercheck.

def powercheck(lst):
    count = 0
    position = 0
    for pos in KnuthMorrisPratt(double(lst), lst):
        count += 1
        position = pos
        if count == 2:
            break

    return lst[:position]


def double(lst):
    for i in range(1,3):
        for elem in lst:
            yield elem

def main():
    print powercheck([1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1])

if __name__ == "__main__":
    main()


And here is the KMP code which I used (due to David Eppstein).

# Knuth-Morris-Pratt string matching
# David Eppstein, UC Irvine, 1 Mar 2002

def KnuthMorrisPratt(text, pattern):

    '''Yields all starting positions of copies of the pattern in the text.
Calling conventions are similar to string.find, but its arguments can be
lists or iterators, not just strings, it returns all matches, not just
the first one, and it does not need the whole text in memory at once.
Whenever it yields, it will have read the text exactly up to and including
the match that caused the yield.'''

    # allow indexing into pattern and protect against change during yield
    pattern = list(pattern)

    # build table of shift amounts
    shifts = [1] * (len(pattern) + 1)
    shift = 1
    for pos in range(len(pattern)):
        while shift <= pos and pattern[pos] != pattern[pos-shift]:
            shift += shifts[pos-shift]
        shifts[pos+1] = shift

    # do the actual search
    startPos = 0
    matchLen = 0
    for c in text:
        while matchLen == len(pattern) or \
              matchLen >= 0 and pattern[matchLen] != c:
            startPos += shifts[matchLen]
            matchLen -= shifts[matchLen]
        matchLen += 1
        if matchLen == len(pattern):
            yield startPos


For your sample this outputs

[1,0,1,1]


as expected.

I compared this against shx2's code(not the numpy one), by generating a random 50 bit string, then replication to make the total length as 1 million. This was the output (the decimal number is the output of time.time())

1362988461.75

(50, [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1])

1362988465.96

50 [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1]

1362988487.14


The above method took ~4 seconds, while shx2's method took ~21 seconds!

Here was the timing code. (shx2's method was called powercheck2).

def rand_bitstring(n):
    rand = random.SystemRandom()
    lst = []
    for j in range(1, n+1):
        r = rand.randint(1,2)
        if r == 2:
            lst.append(0)
        else:
            lst.append(1)

    return lst

def main():
    lst = rand_bitstring(50)*200000
    print time.time()
    print powercheck(lst)
    print time.time()
    powercheck2(lst)
    print time.time()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2021-02-07 14:57
              
            
            
                                                                       
The following solution is O(N^2), but has the advantage of not creating any copies (or slices) of your data, as it is based on iterators.

Depending on the size of your input, the fact you avoid making copies of the data can result in a significant speed-up, but of course, it would not scale as well for huge inputs as algorithms with lower complexity (e.g. O(N*logN)).

[This is the second revision of my solution, the first one is given below.  This one is simpler to understand, and is more along the lines of OP's tuple-multiplication, only using iterators.]

from itertools import izip, chain, tee

def iter_eq(seq1, seq2):
    """ assumes the sequences have the same len """
    return all( v1 == v2 for v1, v2 in izip(seq1, seq2) )

def dup_seq(seq, n):
    """ returns an iterator which is seq chained to itself n times """
    return chain(*tee(seq, n))

def is_reps(arr, slice_size):
    if len(arr) % slice_size != 0:
        return False
    num_slices = len(arr) / slice_size
    return iter_eq(arr, dup_seq(arr[:slice_size], num_slices))

s = (1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)
for i in range(1,len(s)):
    if is_reps(s, i):
        print i, s[:i]
        break


[My original solution]

from itertools import islice

def is_reps(arr, num_slices):
    if len(arr) % num_slices != 0:
        return False
    slice_size = len(arr) / num_slices
    for i in xrange(slice_size):
        if len(set( islice(arr, i, None, num_slices) )) > 1:
            return False
    return True

s = (1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)
for i in range(1,len(s)):
    if is_reps(s, i):
        print i, s[:i]
        break


You can avoid the call to set() by using something like:

def is_iter_unique(seq):
    """ a faster version of testing len(set(seq)) <= 1 """
    seen = set()
    for x in seq:
        seen.add(x)
        if len(seen) > 1:
            return False
    return True


and replacing this line:

if len(set( islice(arr, i, None, num_slices) )) > 1:


with:

if not is_iter_unique(islice(arr, i, None, num_slices)):

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  逝去的感伤        
                
              
                            
                2021-02-07 14:58
              
            
            
                                                                       
This one is just a dumb recursive comparison in Haskell. It takes about one second for Knoothe's million long string (f a). Cool problem! I'll think about it some more.

a = concat $ replicate 20000 
    [1,1,1,0,0,1,0,1,0,0,1,0,0,1,1,1,0,0,
     0,0,0,0,1,1,1,1,0,0,0,1,1,0,1,1,1,1,
     1,1,1,0,0,1,1,1,0,0,0,0,0,1]

f s = 
  f' s [] where
    f' [] result = []
    f' (x:xs) result =
      let y = result ++ [x]   
      in if concat (replicate (div (length s) (length y)) y) == s
            then y
            else f' xs y

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2021-02-07 15:01
              
            
            
                                                                       
You can archive it in sublinear time by XOR'ing the rotated binary form for the input array:


get the binary representation of the array, input_binary
loop from i = 1 to len(input_array)/2, and for each loop, rotate the input_binary to the right by i bits, save it as rotated_bin, then compare the XOR of rotated_bin and input_binary.
The first i that yields 0, is the index to which is the desired substring.


Complete code:

def get_substring(arr):
    binary = ''.join(map(str, arr)) # join the elements to get the binary form

    for i in xrange(1, len(arr) / 2):
        # do a i bit rotation shift, get bit string sub_bin
        rotated_bin = binary[-i:] + binary[:-i]
        if int(rotated_bin) ^ int(binary) == 0:
            return arr[0:i]

    return None


if __name__ == "__main__":
    test = [1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1]
    print get_substring(test) # [1,0,1,1]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2021-02-07 15:07
              
            
            
                                                                       
Just a different approach to the problem

I first determine all the factors of the length and then split the list and check if all the parts are same

>>> def f(s):
    def factors(n):
        #http://stackoverflow.com/a/6800214/977038
        return set(reduce(list.__add__,
                ([i, n//i] for i in range(2, int(n**0.5) + 1) if n % i == 0)))
    _len = len(s)
    for fact in reversed(list(factors(_len))):
        compare_set = set(izip(*[iter(s)]*fact))
        if len(compare_set) == 1:
            return compare_set


>>> f(t)
set([(1, 0, 1, 1)])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦毁少年i        
                
              
                            
                2021-02-07 15:08
              
            
            
                                                                       
Simplifying Knoothe's solution. His algorithm is right, but his implementation is too complex. This implementation is also O(n).

Since your array is only composed of ones and zeros, what I do is use existing str.find implementation (Bayer Moore) to implement Knoothe's idea. It's suprisingly simpler and amazingly faster at runtime.

def f(s):
    s2 = ''.join(map(str, s))
    return s[:(s2+s2).index(s2, 1)]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复