Detect whether sequence is a multiple of a subsequence in Python

前端 未结 7 1572
轮回少年
轮回少年 2021-02-07 14:44

I have a tuple of zeros and ones, for instance:

(1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1)

It turns out:

(1, 0, 1, 1, 1, 0, 1, 1, 1,          


        
7条回答
  •  醉梦人生
    2021-02-07 14:47

    I believe I have an O(n) time solution (actually 2n+r, n is length of tuple, r is sub tuplle) which does not use suffix trees, but uses a string matching algorithm (like KMP, which you should find off-the shelf).

    We use the following little known theorem:

    If x,y are strings over some alphabet,
    
    then xy = yx if and only if x = z^k and y = z^l for some string z and integers k,l.
    

    I now claim that, for the purposes of our problem, this means that all we need to do is determine if the given tuple/list (or string) is a cyclic shift of itself!

    To determine if a string is a cyclic shift of itself, we concatenate it with itself (it does not even have to be a real concat, just a virtual one will do) and check for a substring match (with itself).

    For a proof of that, suppose the string is a cyclic shift of itself.

    The we have that the given string y = uv = vu. Since uv = vu, we must have that u = z^k and v= z^l and hence y = z^{k+l} from the above theorem. The other direction is easy to prove.

    Here is the python code. The method is called powercheck.

    def powercheck(lst):
        count = 0
        position = 0
        for pos in KnuthMorrisPratt(double(lst), lst):
            count += 1
            position = pos
            if count == 2:
                break
    
        return lst[:position]
    
    
    def double(lst):
        for i in range(1,3):
            for elem in lst:
                yield elem
    
    def main():
        print powercheck([1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1])
    
    if __name__ == "__main__":
        main()
    

    And here is the KMP code which I used (due to David Eppstein).

    # Knuth-Morris-Pratt string matching
    # David Eppstein, UC Irvine, 1 Mar 2002
    
    def KnuthMorrisPratt(text, pattern):
    
        '''Yields all starting positions of copies of the pattern in the text.
    Calling conventions are similar to string.find, but its arguments can be
    lists or iterators, not just strings, it returns all matches, not just
    the first one, and it does not need the whole text in memory at once.
    Whenever it yields, it will have read the text exactly up to and including
    the match that caused the yield.'''
    
        # allow indexing into pattern and protect against change during yield
        pattern = list(pattern)
    
        # build table of shift amounts
        shifts = [1] * (len(pattern) + 1)
        shift = 1
        for pos in range(len(pattern)):
            while shift <= pos and pattern[pos] != pattern[pos-shift]:
                shift += shifts[pos-shift]
            shifts[pos+1] = shift
    
        # do the actual search
        startPos = 0
        matchLen = 0
        for c in text:
            while matchLen == len(pattern) or \
                  matchLen >= 0 and pattern[matchLen] != c:
                startPos += shifts[matchLen]
                matchLen -= shifts[matchLen]
            matchLen += 1
            if matchLen == len(pattern):
                yield startPos
    

    For your sample this outputs

    [1,0,1,1]
    

    as expected.

    I compared this against shx2's code(not the numpy one), by generating a random 50 bit string, then replication to make the total length as 1 million. This was the output (the decimal number is the output of time.time())

    1362988461.75
    
    (50, [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1])
    
    1362988465.96
    
    50 [1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1]
    
    1362988487.14
    

    The above method took ~4 seconds, while shx2's method took ~21 seconds!

    Here was the timing code. (shx2's method was called powercheck2).

    def rand_bitstring(n):
        rand = random.SystemRandom()
        lst = []
        for j in range(1, n+1):
            r = rand.randint(1,2)
            if r == 2:
                lst.append(0)
            else:
                lst.append(1)
    
        return lst
    
    def main():
        lst = rand_bitstring(50)*200000
        print time.time()
        print powercheck(lst)
        print time.time()
        powercheck2(lst)
        print time.time()
    

提交回复
热议问题