String count with overlapping occurrences

前端 未结 22 3000
耶瑟儿~
耶瑟儿~ 2020-11-21 23:25

What\'s the best way to count the number of occurrences of a given string, including overlap in Python? This is one way:

def function(string, str_to_search_f         


        
相关标签:
22条回答
  • 2020-11-21 23:36

    This is another example of using str.find() but a lot of the answers make it more complicated than necessary:

    def occurrences(text, sub):
        c, n = 0, text.find(sub)
        while n != -1:
            c += 1
            n = text.find(sub, n+1)
        return c
    
    In []:
    occurrences('1011101111', '11')
    
    Out[]:
    5
    
    0 讨论(0)
  • 2020-11-21 23:36

    Given

    sequence = '1011101111'
    sub = "11"
    

    Code

    In this particular case:

    sum(x == tuple(sub) for x in zip(sequence, sequence[1:]))
    # 5
    

    More generally, this

    windows = zip(*([sequence[i:] for i, _ in enumerate(sequence)][:len(sub)]))
    sum(x == tuple(sub) for x in windows)
    # 5
    

    or extend to generators:

    import itertools as it
    
    
    iter_ = (sequence[i:] for i, _ in enumerate(sequence))
    windows = zip(*(it.islice(iter_, None, len(sub))))
    sum(x == tuple(sub) for x in windows)
    

    Alternative

    You can use more_itertools.locate:

    import more_itertools as mit
    
    
    len(list(mit.locate(sequence, pred=lambda *args: args == tuple(sub), window_size=len(sub))))
    # 5
    
    0 讨论(0)
  • 2020-11-21 23:37
    s = "bobobob"
    sub = "bob"
    ln = len(sub)
    print(sum(sub == s[i:i+ln] for i in xrange(len(s)-(ln-1))))
    
    0 讨论(0)
  • 2020-11-21 23:38

    A fairly pythonic way would be to use list comprehension here, although it probably wouldn't be the most efficient.

    sequence = 'abaaadcaaaa'
    substr = 'aa'
    
    counts = sum([
        sequence.startswith(substr, i) for i in range(len(sequence))
    ])
    print(counts)  # 5
    

    The list would be [False, False, True, False, False, False, True, True, False, False] as it checks all indexes through the string, and because int(True) == 1, sum gives us the total number of matches.

    0 讨论(0)
  • 2020-11-21 23:40

    If strings are large, you want to use Rabin-Karp, in summary:

    • a rolling window of substring size, moving over a string
    • a hash with O(1) overhead for adding and removing (i.e. move by 1 char)
    • implemented in C or relying on pypy
    0 讨论(0)
  • 2020-11-21 23:41

    Well, this might be faster since it does the comparing in C:

    def occurrences(string, sub):
        count = start = 0
        while True:
            start = string.find(sub, start) + 1
            if start > 0:
                count+=1
            else:
                return count
    
    0 讨论(0)
提交回复
热议问题