How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)?

后端 未结 8 2259
生来不讨喜
生来不讨喜 2020-11-21 05:37

I have a file like the following and I would like to print the lines between two given patterns PAT1 and PAT2.



        
相关标签:
8条回答
  • 2020-11-21 06:01

    Using grep with PCRE (where available) to print markers and lines between markers:

    $ grep -Pzo "(?s)(PAT1(.*?)(PAT2|\Z))" file
    PAT1
    3    - first block
    4
    PAT2
    PAT1
    7    - second block
    PAT2
    PAT1
    10    - third block
    
    • -P perl-regexp, PCRE. Not in all grep variants
    • -z Treat the input as a set of lines, each terminated by a zero byte instead of a newline
    • -o print only matching
    • (?s) DotAll, ie. dot finds newlines as well
    • (.*?) nongreedy find
    • \Z Match only at end of string, or before newline at the end

    Print lines between markers excluding end marker:

    $ grep -Pzo "(?s)(PAT1(.*?)(?=(\nPAT2|\Z)))" file
    PAT1
    3    - first block
    4
    PAT1
    7    - second block
    PAT1
    10    - third block
    
    • (.*?)(?=(\nPAT2|\Z)) nongreedy find with lookahead for \nPAT2 and \Z

    Print lines between markers excluding markers:

    $ grep -Pzo "(?s)((?<=PAT1\n)(.*?)(?=(\nPAT2|\Z)))" file
    3    - first block
    4
    7    - second block
    10    - third block
    
    • (?<=PAT1\n) positive lookbehind for PAT1\n

    Print lines between markers excluding start marker:

    $ grep -Pzo "(?s)((?<=PAT1\n)(.*?)(PAT2|\Z))" file
    3    - first block
    4
    PAT2
    7    - second block
    PAT2
    10    - third block
    
    0 讨论(0)
  • 2020-11-21 06:10

    Print lines between PAT1 and PAT2

    $ awk '/PAT1/,/PAT2/' file
    PAT1
    3    - first block
    4
    PAT2
    PAT1
    7    - second block
    PAT2
    PAT1
    10    - third block
    

    Or, using variables:

    awk '/PAT1/{flag=1} flag; /PAT2/{flag=0}' file
    

    How does this work?

    • /PAT1/ matches lines having this text, as well as /PAT2/ does.
    • /PAT1/{flag=1} sets the flag when the text PAT1 is found in a line.
    • /PAT2/{flag=0} unsets the flag when the text PAT2 is found in a line.
    • flag is a pattern with the default action, which is to print $0: if flag is equal 1 the line is printed. This way, it will print all those lines occurring from the time PAT1 occurs and up to the next PAT2 is seen. This will also print the lines from the last match of PAT1 up to the end of the file.

    Print lines between PAT1 and PAT2 - not including PAT1 and PAT2

    $ awk '/PAT1/{flag=1; next} /PAT2/{flag=0} flag' file
    3    - first block
    4
    7    - second block
    10    - third block
    

    This uses next to skip the line that contains PAT1 in order to avoid this being printed.

    This call to next can be dropped by reshuffling the blocks: awk '/PAT2/{flag=0} flag; /PAT1/{flag=1}' file.

    Print lines between PAT1 and PAT2 - including PAT1

    $ awk '/PAT1/{flag=1} /PAT2/{flag=0} flag' file
    PAT1
    3    - first block
    4
    PAT1
    7    - second block
    PAT1
    10    - third block
    

    By placing flag at the very end, it triggers the action that was set on either PAT1 or PAT2: to print on PAT1, not to print on PAT2.

    Print lines between PAT1 and PAT2 - including PAT2

    $ awk 'flag; /PAT1/{flag=1} /PAT2/{flag=0}' file
    3    - first block
    4
    PAT2
    7    - second block
    PAT2
    10    - third block
    

    By placing flag at the very beginning, it triggers the action that was set previously and hence print the closing pattern but not the starting one.

    Print lines between PAT1 and PAT2 - excluding lines from the last PAT1 to the end of file if no other PAT2 occurs

    This is based on a solution by Ed Morton.

    awk 'flag{
            if (/PAT2/)
               {printf "%s", buf; flag=0; buf=""}
            else
                buf = buf $0 ORS
         }
         /PAT1/ {flag=1}' file
    

    As a one-liner:

    $ awk 'flag{ if (/PAT2/){printf "%s", buf; flag=0; buf=""} else buf = buf $0 ORS}; /PAT1/{flag=1}' file
    3    - first block
    4
    7    - second block
    
    # note the lack of third block, since no other PAT2 happens after it
    

    This keeps all the selected lines in a buffer that gets populated from the moment PAT1 is found. Then, it keeps being filled with the following lines until PAT2 is found. In that point, it prints the stored content and empties the buffer.

    0 讨论(0)
提交回复
热议问题