Find shortest matches between two strings

后端 未结 4 1203
走了就别回头了
走了就别回头了 2020-11-22 16:30

I have a large log file, and I want to extract a multi-line string between two strings: start and end.

The following is sample from the

相关标签:
4条回答
  • 2020-11-22 17:06

    This regex should match what you want:

    (start((?!start).)*?end)
    

    Use re.findall method and single-line modifier re.S to get all the occurences in a multi-line string:

    re.findall('(start((?!start).)*?end)', text, re.S)
    

    See a test here.

    0 讨论(0)
  • 2020-11-22 17:17

    You could do (?s)start.*?(?=end|start)(?:end)?, then filter out everything not ending in "end".

    0 讨论(0)
  • 2020-11-22 17:29

    This is tricky to do because by default, the re module does not look at overlapping matches. Newer versions of Python have a new regex module that allows for overlapping matches.

    https://pypi.python.org/pypi/regex

    You'd want to use something like

    regex.findall(pattern, string, overlapped=True)
    

    If you're stuck with Python 2.x or something else that doesn't have regex, it's still possible with some trickery. One brilliant person solved it here:

    Python regex find all overlapping matches?

    Once you have all possible overlapping (non-greedy, I imagine) matches, just determine which one is shortest, which should be easy.

    0 讨论(0)
  • 2020-11-22 17:33

    Do it with code - basic state machine:

    open = False
    tmp = []
    for ln in fi:
        if 'start' in ln:
            if open:
                tmp = []
            else:
                open = True
    
        if open:
            tmp.append(ln)
    
        if 'end' in ln:
            open = False
            for x in tmp:
                print x
            tmp = []
    
    0 讨论(0)
提交回复
热议问题