Find shortest matches between two strings

后端未结

关注

 4  1203

走了就别回头了

I have a large log file, and I want to extract a multi-line string between two strings: start and end.

The following is sample from the

相关标签:

4条回答

鱼传尺愫

2020-11-22 17:06
This regex should match what you want:
```
(start((?!start).)*?end)
```
Use re.findall method and single-line modifier re.S to get all the occurences in a multi-line string:
```
re.findall('(start((?!start).)*?end)', text, re.S)
```
See a test here.
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2020-11-22 17:17

You could do (?s)start.*?(?=end|start)(?:end)?, then filter out everything not ending in "end".

0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-11-22 17:29
This is tricky to do because by default, the re module does not look at overlapping matches. Newer versions of Python have a new regex module that allows for overlapping matches.

https://pypi.python.org/pypi/regex

You'd want to use something like
```
regex.findall(pattern, string, overlapped=True)
```
If you're stuck with Python 2.x or something else that doesn't have regex, it's still possible with some trickery. One brilliant person solved it here:

Python regex find all overlapping matches?

Once you have all possible overlapping (non-greedy, I imagine) matches, just determine which one is shortest, which should be easy.
0 讨论(0)
发布评论:

提交评论
- 加载中...

旧巷少年郎

2020-11-22 17:33

Do it with code - basic state machine:

open = False
tmp = []
for ln in fi:
    if 'start' in ln:
        if open:
            tmp = []
        else:
            open = True

    if open:
        tmp.append(ln)

    if 'end' in ln:
        open = False
        for x in tmp:
            print x
        tmp = []

0 讨论(0)