grep 3 latest occurences and some lines around the occurence

问题

I have a file like:

exception: anythinggggg...
exception: anythinggggg...
abchdhjsdhsd
ygsuhesnkc
exception: anythingggg...
exception: anything...
..
..

I want to grep the latest 2 occurrences of exception keyword along with 3 lines before and 3 lines after it.

I am using something like

grep -C 3 exception | tail -12

I am using tail -12 here as I want 6 lines per occurrence and latest 2 occurrences. this works fine when occurrences of exception are far off from each other but gives me useless lines if say both occurrences are consecutive.

abdgjsd
abdgjsd
abdgjsd
abdgjsd
abdgjsd
abdgjsd
abdgjsd
abdgjsd
exception
exception
exception
abcd

In the above case, it gives me

abdgjsd
abdgjsd
abdgjsd
exception
exception
exception
abcd

however, what I want is

abdgjsd
exception
exception -----------------> OUTPUT FOR FIRST OCCURRENCE
exception
abcd

abdgjsd
abdgjsd
exception-----------------> OUTPUT FOR SECOND OCCURRENCE
exception
exception
abcd

Is there another way to this? Probably something in whch I can also specify the number of occurrences and not just grep lines and tail some output from it.

回答1:

The output you get is because grep stops printing context (-C) at the next match. I don't see how to make it behave otherwise.

The script below (written on the command-line) reads the whole file and forms an array of lines. Then it goes through it and prints surrounding two lines for each match, or up to start/end of array.

perl -MList::Util=min,max -0777 -wnE'
    @m = split /\n/; 
    for (0..$#m) { 
        if ($m[$_] =~ /exception/) { 
            $bi = max(0,$_-2); 
            $ei = min($_+2, $#m);
            say for @m[$bi..$ei]; 
            say "---" 
         } 
     }
' input.txt

The --- are printed for easier reviewing of output. This prints the desired output.

The -0777 option makes it slurp the whole file into the $_ variable, which is split by newline. The iteration goes over the array index ($#m is the index of the last element of @m). The $bi and $ei are begin/end index to print, which cannot be +/- 2 near the beginning and end of the array.

The output can be piped to tail but this can't be automated: if a match is within the last two lines there'll be (one or two) fewer lines of output so input need be known for precise cut-off. Or find indices of matches in the script, @idx = grep { $m[$_] =~ /exception/} for 0..$#m;, and use that in the condition to only print the last two.

If you are going to use something like this I'd make it a script. Then read all lines into an array directly, provide command-line options (like -C in grep), etc.

Maintaining line-by-line processing would make the job far more complicated. We need to keep track of a match so that we can print the following lines once we read them. But here we need multiple such records -- for the next match(es) as well, if they come within the following lines to be printed.

回答2:

Here is a start:

$ cat tst.awk
BEGIN { bufSize = 5 }
{ updBuf(NR) }
/exception/ { rangeEndNrs[NR+int(bufSize/2)] }
NR in rangeEndNrs { prtBuf(NR) }
END { prtBuf(NR+1) }

function updBuf(nr) {
    buf[((nr-1)%bufSize)+1] = $0
}

function prtBuf(nr,     i) {
    for (i=1; i<=bufSize; i++) {
        print buf[((nr+i-1)%bufSize)+1]
    }
    print "---"
}

$ awk -f tst.awk file
abdgjsd
abdgjsd
exception
exception
exception
---
abdgjsd
exception
exception
exception
abcd
---
exception
exception
exception
abcd
abdgjsd
---

It works by just keeping a 5-line buffer of input lines and everywhere "exception" is found setting an indicator that 2 lines later the buffer should be printed, thus printing the "exception" line plus the 2 lines before and after it. You just need to massage it to handle the cases of "exception" occurring less than 2 lines from the start or end of the input file however you want that handled (see how the last output block above is, presumably undesirably, wrapping around the buffer since it's run off the end of the input file).

来源：https://stackoverflow.com/questions/44491833/grep-3-latest-occurences-and-some-lines-around-the-occurence

标签

perl

unix

grep

tail