How do I find all overlapping matches of variable size?

I want to find all the substrings of '01' that contain a digit or more using a regex, i.e. I want to get (in whatever order):

['0', '01', '1']

The problem is that regex matches don't usually pick out overlapping substrings:

>>> re.findall(r'\d+', '01')

A clever workaround (found here) involves using a lookahead. But this still isn't satisfactory, as it will only find one match per position in the string:

>>> re.findall(r'(?=(\d+))', '01')
['01', '1']

The only way I can think of to solve this is using the above solution and looping over every possible substring length:

s = '01'
matches = []
for n in range(1, len(s) + 1):
    matches += re.findall(r'(?=(\d{%i}))' % n, s)

Is there a better, inbuilt way to do this directly with the regular expression? Or maybe regex are not the right tool for this?



An alternative solution to using regex, using this answer adapted to Python 3 for getting all the substrings:


def get_all_substrings(input_string):
    length = len(input_string)
    return [input_string[i:j+1] for i in range(length) for j in range(i,length)]

s = '01'

strings = [sub for sub in get_all_substrings(s) if any(x.isdigit() for x in sub)]


>>> strings
['0', '01', '1']
>>> s = '0td1'
>>> [sub for sub in get_all_substrings(s) if any(x.isdigit() for x in sub)]
['0', '0t', '0td', '0td1', 'td1', 'd1', '1']


You could use a simple regex, \d+, then create a powerset of each match (excluding null sets). Here's a powerset function I wrote:

import itertools

def powerset(container, min_length=0):
    Generate the powerset of container.

    A powerset is the set of all subsets of a given set, but this
    function is more flexible with input types. Output is an iterator
    of tuples.

    min_length is set to 0 to include the empty set, but can be
    set to 1 to exclude it.
    for i in range(min_length, len(container)+1):
        yield from itertools.combinations(container, i)

import re
s = '01 eggs 98'
matches = re.findall(r'\d+', s)
result = [''.join(x) for match in matches for x in powerset(match, 1)]
print(result)  # -> ['0', '1', '01', '9', '8', '98']

