问题
I have a string:
mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"
What I want is a list of substrings between the markers start="&maker1"
and end="/\n"
. Thus, the expected result is:
whatIwant = ["The String that I want", "Another string that I want"]
I've read the answers here:
- Find string between two substrings [duplicate]
- How to extract the substring between two markers?
And tried this but not successfully,
>>> import re
>>> mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"
>>> whatIwant = re.search("&marker1(.*)/\n", mystr)
>>> whatIwant.group(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
What could I do to resolve this? Also, I have a very long string
>>> len(myactualstring)
7792818
回答1:
Consider this option using re.findall
:
mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"
matches = re.findall(r'&marker1\n(.*?)\s*/\n', mystr)
print(matches)
This prints:
['The String that I want', 'Another string that I want']
Here is an explanation of the regex pattern:
&marker1 match a marker
\n newline
(.*?) match AND capture all content until reaching the first
\s* optional whitespace, followed by
/\n / and newline
Note that re.findall
will only capture what appears in the (...)
capture group, which is what you are trying to extract.
回答2:
What could I do to resolve this? I would do:
import re
mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"
found = re.findall(r"\&marker1\n(.*?)/\n", mystr)
print(found)
Output:
['The String that I want ', 'Another string that I want ']
Note that:
&
has special meaning inre
patterns, if you want literal & you need to escape it (\&
).
does match anything except newlinesfindall
is better suited choiced if you just want list of matched substrings, rather thansearch
*?
is non-greedy, in this case.*
would work too, because.
do not match newline, but in other cases you might ending matching more than you wish- I used so-called raw-string (r-prefixed) to make escaping easier
Read module re
documentation for discussion of raw-string usage and implicit list of characters with special meaning.
来源:https://stackoverflow.com/questions/62342552/extract-all-substrings-between-two-markers