问题
I want to find all the substrings of '01' that contain a digit or more using a regex, i.e. I want to get (in whatever order):
['0', '01', '1']
The problem is that regex matches don't usually pick out overlapping substrings:
>>> re.findall(r'\d+', '01')
['01']
A clever workaround (found here) involves using a lookahead. But this still isn't satisfactory, as it will only find one match per position in the string:
>>> re.findall(r'(?=(\d+))', '01')
['01', '1']
The only way I can think of to solve this is using the above solution and looping over every possible substring length:
s = '01'
matches = []
for n in range(1, len(s) + 1):
matches += re.findall(r'(?=(\d{%i}))' % n, s)
Is there a better, inbuilt way to do this directly with the regular expression? Or maybe regex are not the right tool for this?
Thanks!
回答1:
An alternative solution to using regex, using this answer adapted to Python 3 for getting all the substrings:
Code:
def get_all_substrings(input_string):
length = len(input_string)
return [input_string[i:j+1] for i in range(length) for j in range(i,length)]
s = '01'
strings = [sub for sub in get_all_substrings(s) if any(x.isdigit() for x in sub)]
Result:
>>> strings
['0', '01', '1']
>>> s = '0td1'
>>> [sub for sub in get_all_substrings(s) if any(x.isdigit() for x in sub)]
['0', '0t', '0td', '0td1', 'td1', 'd1', '1']
回答2:
You could use a simple regex, \d+
, then create a powerset of each match (excluding null sets). Here's a powerset
function I wrote:
import itertools
def powerset(container, min_length=0):
"""
Generate the powerset of container.
A powerset is the set of all subsets of a given set, but this
function is more flexible with input types. Output is an iterator
of tuples.
min_length is set to 0 to include the empty set, but can be
set to 1 to exclude it.
"""
for i in range(min_length, len(container)+1):
yield from itertools.combinations(container, i)
import re
s = '01 eggs 98'
matches = re.findall(r'\d+', s)
result = [''.join(x) for match in matches for x in powerset(match, 1)]
print(result) # -> ['0', '1', '01', '9', '8', '98']
来源:https://stackoverflow.com/questions/59217366/how-do-i-find-all-overlapping-matches-of-variable-size