I am using Python 3.6.
My goal is to match a regex which may match multiple strings, overlapping and/or starting from the same position, for example:
As I've said above, regex is a primarily linear and single-rule-only kind of engine - you can choose between greedy capture or not, but you cannot select both. Also, most regex engines do not support overlapping matches (and even those who support it kind of fake it with substrings / forced head move) because it also doesn't fit regex philosophy.
If you're looking only for simple overlapping matches between two substrings, you can implement it yourself:
def find_substrings(data, start, end):
result = []
s_len = len(start) # a shortcut for `start` length
e_len = len(end) # a shortcut for `end` length
current_pos = data.find(start) # find the first occurrence of `start`
while current_pos != -1: # loop while we can find `start` in our data
# find the first occurrence of `end` after the current occurrence of `start`
end_pos = data.find(end, current_pos + s_len)
while end_pos != -1: # loop while we can find `end` after the current `start`
end_pos += e_len # just so we include the selected substring
result.append(data[current_pos:end_pos]) # add the current substring
end_pos = data.find(end, end_pos) # find the next `end` after the curr. `start`
current_pos = data.find(start, current_pos + s_len) # find the next `start`
return result
Which will yield:
substrings = find_substrings("BADACBA", "B", "A")
# ['BA', 'BADA', 'BADACBA', 'BA']
But you'll have to modify it for more complex matches.