Assume the following word sequence
BLA text text text text text text BLA text text text text LOOK text text text BLA text text BLA
What I
Another way to extract the desired text is to use the tempered greedy token technique, which matches a series of individual characters that do not begin an unwanted string.
r'\bBLA\b(?:(?!\bBLA\b).)*\bLOOK\b'
Start your engine! | Python code
\bBLA\b : match 'BLA' with word boundaries
(?: : begin non-capture group
(?!\bBLA\b) : negative lookahead asserts following characters are not
'BLA' with word boundaries
. : match any character
) : end non-capture group
* : execute non-capture group 0+ times
\bLOOK\b : match 'LOOK' with word boundaries
Word boundaries are included to avoid matching words such as BLACK
and TRAILBLAZER
.
(?s)BLA(?:(?!BLA).)*?LOOK
Try this. See demo.
Alternatively, use
BLA(?:(?!BLA|LOOK)[\s\S])*LOOK
To be safer.
simply find text between LOOK and BLA without BLA
In : re.search(r'BLA [^(BLA)]+ LOOK', 'BLA text text text text text text BLA text text text text LOOK text text text BLA text text BLA').group()
Out: 'BLA text text text text LOOK'
:-)