Regex including overlapping matches with same start

后端未结

关注

 1  745

I am using Python 3.6.

My goal is to match a regex which may match multiple strings, overlapping and/or starting from the same position, for example:

相关标签:

1条回答

逝去的感伤

2020-12-02 03:14

As I've said above, regex is a primarily linear and single-rule-only kind of engine - you can choose between greedy capture or not, but you cannot select both. Also, most regex engines do not support overlapping matches (and even those who support it kind of fake it with substrings / forced head move) because it also doesn't fit regex philosophy.

If you're looking only for simple overlapping matches between two substrings, you can implement it yourself:

def find_substrings(data, start, end):
    result = []
    s_len = len(start)  # a shortcut for `start` length
    e_len = len(end)  # a shortcut for `end` length
    current_pos = data.find(start)  # find the first occurrence of `start`
    while current_pos != -1:  # loop while we can find `start` in our data
        # find the first occurrence of `end` after the current occurrence of `start`
        end_pos = data.find(end, current_pos + s_len)
        while end_pos != -1:  # loop while we can find `end` after the current `start`
            end_pos += e_len  # just so we include the selected substring
            result.append(data[current_pos:end_pos])  # add the current substring
            end_pos = data.find(end, end_pos)  # find the next `end` after the curr. `start`
        current_pos = data.find(start, current_pos + s_len)  # find the next `start`
    return result

Which will yield:

substrings = find_substrings("BADACBA", "B", "A")
# ['BA', 'BADA', 'BADACBA', 'BA']

But you'll have to modify it for more complex matches.

0 讨论(0)