Overlapping count of substring in a string in Python

后端未结

关注

 8  2130

I want to find all the counts (overlapping and non-overlapping) of a sub-string in a string. I found two answers one of which is using regex which is not my intention and t

相关标签:

8条回答

无人共我

2021-01-07 04:10

Here's a function you could use:

def count(haystack, needle):
    return len([x for x in [haystack[i:j+1] for i in xrange(len(haystack)) for j in xrange(i,len(haystack))] if x == needle])

Then:

>>> count("ababaa", "aba")
2

0 讨论(0)

不思量自难忘°

2021-01-07 04:14

count = len(set([string.find('aba',x) for x in range(len(string)) if string.find('aba',x) >= 0]))

0 讨论(0)

囚心锁ツ

2021-01-07 04:22
Another way to consider is by leveraging the Counter container. While the accepted answer is fastest for shorter strings, if you are searching relatively short substrings within long strings the Counter approach starts to take the edge. Also, if you have need to refactor this to perform multiple substring count queries against the same main string, then the Counter approach starts looking much more attractive

For example, searching for a substring of length = 3 gave me the following results using timeit;

Main string length / Accepted Answer / Counter Approach

6 characters / 4.1us / 7.4us

50 characters / 24.4us / 25us

150 characters / 70.7us / 64.9us

1500 characters / 723us / 614us
```
from collections import Counter

def count_w_overlap(search_string, main_string):
    #Split up main_string into all possible overlap possibilities
    search_len = len(search_string)
    candidates = [main_string[i:i+search_len] for i in range(0, len(main_string) - search_len + 1)]
    #Create the Counter container
    freq_count = Counter(candidates)
    return freq_count[search_string]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

萌比男神i

2021-01-07 04:25

Here, using re.finditer() is the best way to achieve what you want.

import re 

def get_substring_count(s, sub_s):
    return sum(1 for m in re.finditer('(?=%s)' % sub_s, s))

get_substring_count('ababaa', 'aba')
# 2 as response

0 讨论(0)

清歌不尽

2021-01-07 04:25
A brute-force approach is just
```
n = len(needle)
count = sum(haystack[i:i+n] == needle for i in range(len(haystack)-n+1))
```
(this works because in Python True and False are equivalent to numbers 1 and 0 for most uses, including math).

Using a regexp instead it could be
```
count = len(re.findall(needle[:1]+"(?="+re.escape(needle[1:])+")",
                       haystack))
```
(i.e. using a(?=ba) instead of aba to find overlapping matches too)
0 讨论(0)
发布评论:

提交评论
- 加载中...

星月不相逢

2021-01-07 04:31

Looping through sliced string

def count_substring(string, sub_string):
    l = len(sub_string)
    n = len(string)
    count = sum(1 for i in range(n-l+1) if string[i:i+l].count(sub_string)>0 )
    return count

0 讨论(0)

1 2 下一页