Is there a way that I can find out how many matches of a regex are in a string in Python? For example, if I have the string \"It actually happened when it acted out of
The existing solutions based on findall
are fine for non-overlapping matches (and no doubt optimal except maybe for HUGE number of matches), although alternatives such as sum(1 for m in re.finditer(thepattern, thestring))
(to avoid ever materializing the list when all you care about is the count) are also quite possible. Somewhat idiosyncratic would be using subn
and ignoring the resulting string...:
def countnonoverlappingrematches(pattern, thestring):
return re.subn(pattern, '', thestring)[1]
the only real advantage of this latter idea would come if you only cared to count (say) up to 100 matches; then, re.subn(pattern, '', thestring, 100)[1]
might be practical (returning 100 whether there are 100 matches, or 1000, or even larger numbers).
Counting overlapping matches requires you to write more code, because the built-in functions in question are all focused on NON-overlapping matches. There's also a problem of definition, e.g, with pattern being 'a+'
and thestring being 'aa'
, would you consider this to be just one match, or three (the first a
, the second one, both of them), or...?
Assuming for example that you want possibly-overlapping matches starting at distinct spots in the string (which then would give TWO matches for the example in the previous paragraph):
def countoverlappingdistinct(pattern, thestring):
total = 0
start = 0
there = re.compile(pattern)
while True:
mo = there.search(thestring, start)
if mo is None: return total
total += 1
start = 1 + mo.start()
Note that you do have to compile the pattern into a RE object in this case: function re.search
does not accept a start
argument (starting position for the search) the way method search
does, so you'd have to be slicing thestring as you go -- definitely more effort than just having the next search start at the next possible distinct starting point, which is what I'm doing in this function.
this works fine
ptr_str = lambda pattern,string1 :print(f'pattern = {pattern} times = {len(re.findall(pattern,string1))}')
pattern = 'AGATC'
str='AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGATCATAGGTTATATTGT'
ptr_str(pattern,string1)
Have you tried this?
len( pattern.findall(source) )
I know this is a question about regex. I just thought I'd mention the count method for future reference if someone wants a non-regex solution.
>>> s = "It actually happened when it acted out of turn."
>>> s.count('t a')
2
Which return the number of non-overlapping occurrences of the substring
import re
len(re.findall(pattern, string_to_search))
To avoid creating a list of matches one may also use re.sub with a callable as replacement. It will be called on each match, incrementing internal counter.
class Counter(object):
def __init__(self):
self.matched = 0
def __call__(self, matchobj):
self.matched += 1
counter = Counter()
re.sub(some_pattern, counter, text)
print counter.matched