Regex, select closest match

后端未结

关注

 3  582

Assume the following word sequence

BLA text text text  text text text BLA text text text text LOOK text text text BLA text text BLA

What I

相关标签:

3条回答

挽巷

2020-11-29 11:51

Another way to extract the desired text is to use the tempered greedy token technique, which matches a series of individual characters that do not begin an unwanted string.

r'\bBLA\b(?:(?!\bBLA\b).)*\bLOOK\b'

Start your engine! | Python code

\bBLA\b        : match 'BLA' with word boundaries
(?:            : begin non-capture group
  (?!\bBLA\b)  : negative lookahead asserts following characters are not
                 'BLA' with word boundaries
  .            : match any character
)              : end non-capture group
*              : execute non-capture group 0+ times
\bLOOK\b       : match 'LOOK' with word boundaries

Word boundaries are included to avoid matching words such as BLACK and TRAILBLAZER.

0 讨论(0)

栀梦

2020-11-29 12:08
```
(?s)BLA(?:(?!BLA).)*?LOOK
```
Try this. See demo.

Alternatively, use
```
BLA(?:(?!BLA|LOOK)[\s\S])*LOOK
```
To be safer.
0 讨论(0)
发布评论:

提交评论
- 加载中...

Happy的楠姐

2020-11-29 12:12

simply find text between LOOK and BLA without BLA

In : re.search(r'BLA [^(BLA)]+ LOOK', 'BLA text text text  text text text BLA text text text text LOOK text text text BLA text text BLA').group()
Out: 'BLA text text text text LOOK'

:-)

0 讨论(0)