How do I find all overlapping matches of variable size? [duplicate]

此生再无相见时 提交于 2021-01-28 09:34:32

问题


I want to find all the substrings of '01' that contain a digit or more using a regex, i.e. I want to get (in whatever order):

['0', '01', '1']

The problem is that regex matches don't usually pick out overlapping substrings:

>>> re.findall(r'\d+', '01')
['01']

A clever workaround (found here) involves using a lookahead. But this still isn't satisfactory, as it will only find one match per position in the string:

>>> re.findall(r'(?=(\d+))', '01')
['01', '1']

The only way I can think of to solve this is using the above solution and looping over every possible substring length:

s = '01'
matches = []
for n in range(1, len(s) + 1):
    matches += re.findall(r'(?=(\d{%i}))' % n, s)

Is there a better, inbuilt way to do this directly with the regular expression? Or maybe regex are not the right tool for this?

Thanks!


回答1:


An alternative solution to using regex, using this answer adapted to Python 3 for getting all the substrings:

Code:

def get_all_substrings(input_string):
    length = len(input_string)
    return [input_string[i:j+1] for i in range(length) for j in range(i,length)]

s = '01'

strings = [sub for sub in get_all_substrings(s) if any(x.isdigit() for x in sub)]

Result:

>>> strings
['0', '01', '1']
>>> s = '0td1'
>>> [sub for sub in get_all_substrings(s) if any(x.isdigit() for x in sub)]
['0', '0t', '0td', '0td1', 'td1', 'd1', '1']



回答2:


You could use a simple regex, \d+, then create a powerset of each match (excluding null sets). Here's a powerset function I wrote:

import itertools

def powerset(container, min_length=0):
    """
    Generate the powerset of container.

    A powerset is the set of all subsets of a given set, but this
    function is more flexible with input types. Output is an iterator
    of tuples.

    min_length is set to 0 to include the empty set, but can be
    set to 1 to exclude it.
    """
    for i in range(min_length, len(container)+1):
        yield from itertools.combinations(container, i)


import re
s = '01 eggs 98'
matches = re.findall(r'\d+', s)
result = [''.join(x) for match in matches for x in powerset(match, 1)]
print(result)  # -> ['0', '1', '01', '9', '8', '98']


来源:https://stackoverflow.com/questions/59217366/how-do-i-find-all-overlapping-matches-of-variable-size

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!