Extract Values between two strings in a text file

后端 未结 5 394
旧巷少年郎
旧巷少年郎 2021-01-22 15:56

Lets say I have a Text file with the below content

fdsjhgjhg
fdshkjhk
 Start
     Good Morning
     Hello World
 End
dashjkhjk
dsfjkhk
Start
  hgjkkl
  dfghjjk
          


        
相关标签:
5条回答
  • 2021-01-22 16:07

    You can do this with regular expressions. This will exclude rogue Start and End lines. Here is a live example

    import re
    
    f = open('test.txt','r')
    txt = f.read()
    matches = re.findall(r'^\s*Start\s*$\n((?:^\s*(?!Start).*$\n)*?)^\s*End\s*$', txt, flags=re.M)
    
    0 讨论(0)
  • 2021-01-22 16:13

    If you don't expect to get nested structures, you could do this:

    # match everything between "Start" and "End"
    occurences = re.findall(r"Start(.*?)End", text, re.DOTALL)
    # discard text before duplicated occurences of "Start"
    occurences = [oc.rsplit("Start", 1)[-1] for oc in occurences]
    # optionally trim whitespaces
    occurences = [oc.strip("\n") for oc in occurences]
    

    Which prints

    >>> for oc in occurences: print(oc)
         Good Morning
         Hello World
       Good Evening
       Good
    

    You can add the \n as part of Start and End if you want

    0 讨论(0)
  • 2021-01-22 16:22

    Great problem! This is a bucket problem where each start needs an end.

    The reason why you got the result is because there are two consecutive 'Start'.

    It's best to store the information somewhere until 'End' is triggered.

    infile = open('scores.txt','r')
    outfile= open('testt.txt','w')
    copy = False
    for line in infile:
    
        if line.strip() == "Start":
            bucket = []
            copy = True
    
        elif line.strip() == "End":
            for strings in bucket:
                outfile.write( strings + '\n')
            copy = False
    
        elif copy:
            bucket.append(line.strip())
    
    0 讨论(0)
  • 2021-01-22 16:23

    You could keep a temporary list of lines, and only commit them after you know that a section meets your criteria. Maybe try something like the following:

    infile = open('test.txt','r')
    outfile= open('testt.txt','w')
    copy = False
    tmpLines = []
    for line in infile:
        if line.strip() == "Start":
            copy = True
            tmpLines = []
        elif line.strip() == "End":
            copy = False
            for tmpLine in tmpLines:
                outfile.write(tmpLine)
        elif copy:
            tmpLines.append(line)
    

    This gives the output

         Good Morning
         Hello World
     Good Evening
     Good 
    
    0 讨论(0)
  • 2021-01-22 16:25

    Here's a hacky but perhaps more intuitive way using regex. It finds all text that exists between "Start" and "End" pairs, and the print statement trims them off.

    import re 
    infile = open('test.txt','r')
    text = infile.read() 
    
    matches = re.findall('Start.*?End',text)
    for m in matches: 
        print m.strip('Start ').strip(' End')
    
    0 讨论(0)
提交回复
热议问题