Regex including overlapping matches with same start

后端 未结 1 745
忘掉有多难
忘掉有多难 2020-12-02 02:14

I am using Python 3.6.

My goal is to match a regex which may match multiple strings, overlapping and/or starting from the same position, for example:



        
相关标签:
1条回答
  • 2020-12-02 03:14

    As I've said above, regex is a primarily linear and single-rule-only kind of engine - you can choose between greedy capture or not, but you cannot select both. Also, most regex engines do not support overlapping matches (and even those who support it kind of fake it with substrings / forced head move) because it also doesn't fit regex philosophy.

    If you're looking only for simple overlapping matches between two substrings, you can implement it yourself:

    def find_substrings(data, start, end):
        result = []
        s_len = len(start)  # a shortcut for `start` length
        e_len = len(end)  # a shortcut for `end` length
        current_pos = data.find(start)  # find the first occurrence of `start`
        while current_pos != -1:  # loop while we can find `start` in our data
            # find the first occurrence of `end` after the current occurrence of `start`
            end_pos = data.find(end, current_pos + s_len)
            while end_pos != -1:  # loop while we can find `end` after the current `start`
                end_pos += e_len  # just so we include the selected substring
                result.append(data[current_pos:end_pos])  # add the current substring
                end_pos = data.find(end, end_pos)  # find the next `end` after the curr. `start`
            current_pos = data.find(start, current_pos + s_len)  # find the next `start`
        return result
    

    Which will yield:

    substrings = find_substrings("BADACBA", "B", "A")
    # ['BA', 'BADA', 'BADACBA', 'BA']
    

    But you'll have to modify it for more complex matches.

    0 讨论(0)
提交回复
热议问题